Lead Data Scientist - Causal Inference
$180,000-$210,000
New York Metro Area - Hybrid
Our client is on a mission to revolutionize healthcare quality by leveraging advanced data science and analytics. They partner with healthcare providers, payers, and employers to improve patient outcomes through cutting-edge technology and evidence-based insights. With strategic partnerships across major organizations and significant recent investment backing, they are poised to make a transformative impact on healthcare delivery nationwide. Their work focuses on ensuring patients receive the highest standard of care, starting with diagnostic accuracy and expanding into broader healthcare analytics.
THE ROLE
We are seeking a highly skilled Lead Data Scientist who is working heavily with causal inference, specializing in evaluating the impact of healthcare programs through rigorous claims data analysis. The ideal candidate will have strong programming expertise (R/Spark/SQL/Python) and a deep understanding of medical claims data. In this role, you will contribute to developing scalable data pipelines, optimizing codebases, and conducting advanced statistical modeling to support data-driven decision-making.
RESPONSIBILITIES
- Work with large-scale healthcare datasets, including longitudinal claims data, to assess the relationships between healthcare quality and patient outcomes
- Design, maintain, and improve data science pipelines that generate modeling datasets, run statistical models, and produce business reports
- Enhance the efficiency, reproducibility, and scalability of data ETL and statistical code while addressing technical challenges as they arise
- Conduct targeted analyses to uncover business and clinical insights that inform strategy and decision-making
- Apply statistical methodologies, including Generalized Linear Models (GLMs), causal inference techniques, and difference-in-differences analysis, to evaluate program effectiveness and ROI
- Prepare documentation, presentations, and insights for both internal stakeholders and external clients
YOUR BACKGROUND
- PhD in Computer Science, Statistics, Biostatistics, Economics, Data Science, Applied Mathematics, or a related field
- Proficiency in R, Spark, SQL, and Python for data science applications, with a strong emphasis on R
- Experience with scalable code development and collaborative coding environments
- Ability to troubleshoot complex data pipelines and statistical code
- Experience working with medical and claims data, including familiarity with ICD codes, EHR/EMR data
- Knowledge of Generalized Linear Models (GLMs), mixed models, and longitudinal data analysis
- Strong collaborative mindset and ability to work in fast-paced, team-oriented environments
- Exposure to causal inference techniques (e.g., propensity score matching, difference-in-differences) is a plus
- Experience in payer organizations, healthcare consulting, or client-facing analytics roles is a plus
- Experience applying machine learning models, including classification, regression, clustering, and anomaly detection, to healthcare datasets
HOW TO APPLY
If you believe you are a good fit given the above qualifications, send your resume to Grace via the link below.
KEYWORDS
Data Science, Healthcare Analytics, Medical Claims Data, Statistical Modeling, Causal Inference, R, Spark, SQL, Python, Generalized Linear Models, Machine Learning, Healthcare Quality Improvement, Data Pipelines, ETL, Big Data, Data Engineering, AI in Healthcare