Job Title: Data Scientist (Healthcare)
Location: 100% Remote
Visa: USC, GC, H1B Transfer, EAD/W2
Interview Process: Recruiter screen (skills & domain fit)
Technical deep dive (SQL, modeling, case study)
Practical exercise (notebook or take-home)
Stakeholder panel (communication & healthcare use-case discussion
About the Role: We’re looking for a hands-on Data Scientist to solve real-world business problems in healthcare/health insurance using machine learning and modern cloud tooling. You’ll own the full lifecycle—from problem framing and data wrangling to modeling, deployment support, and stakeholder storytelling.
Top Skills: Python/R, SQL, Spark/Databricks, Azure ML or Vertex AI, MLflow, Git/GitHub, Jupyter/Databricks notebooks, Power BI/Tableau, Azure/GCP services.
What You’ll Do
- Partner with business/clinical stakeholders to define measurable use cases (e.g., risk & cost forecasting, readmission/LOS prediction, member churn, fraud/waste/abuse).
- Explore, cleanse, and engineer tabular data from claims, eligibility, EMR, utilization, and care-management sources.
- Build and evaluate models using regression, classification, time series, clustering, trees/GBMs; pilot deep-learning where additive value.
- Productionize with data & MLOps teams: feature pipelines (Spark/Databricks), model packaging, monitoring, drift/decay, and A/B testing.
- Write clear analyses, dashboards, and exec-ready presentations; translate findings into actions and ROI.
- Ensure privacy/compliance (HIPAA/PHI), data governance, and reproducible research (git, notebooks, experiment tracking).
- Contribute to LLM initiatives: prompt engineering, RAG, basic fine-tuning, and experiments with agentic AI frameworks for workflow automation.
Required Qualifications
- 6–8 years of hands-on data science delivering business impact (healthcare/health insurance experience is a big plus).
- Strong tabular data handling/manipulation skills.
- Solid SQL and proficiency in at least one statistical programming language (Python or R).
- Working knowledge of MS Office (Excel/PowerPoint/Access) for quick analyses and stakeholder comms.
- Experience with Big Data tools: HDFS, Hive, Spark, MapR-DB (or equivalent).
- Practical experience with statistical & ML techniques: linear/logistic regression, time series, clustering, decision trees, tree ensembles (XGBoost/LightGBM).
- Cloud & ML platforms: Databricks, Azure ML and/or Google Vertex AI (pipelines, model registry, deployment workflows).
- Exposure to prompt engineering, LLM fine-tuning (LoRA/PEFT or platform-native), and agentic AI concepts.
- Clear, concise communicator; able to convert ambiguity into structured analysis and decisions.
- BS/MS in Computer Science, Statistics, Applied Math, Engineering, or related field (or equivalent experience).