Job Description
Senior Data Scientist | Remote
Product Experimentation & Evaluation (LLMs & AI)
Hiring on behalf of our client in the AI/Technology sector
We are seeking a senior-level Data Scientist to support our client in advancing their AI-powered products through robust experimentation and evaluation practices. This role is central to developing and optimizing large language model (LLM) applications, working at the intersection of product, engineering, and trust & safety teams.
You will lead end-to-end experimentation initiatives, define LLM evaluation frameworks, and drive high-impact product improvements. The ideal candidate has experience in startup-style environments as well as large-scale experimentation systems and is comfortable influencing both technical and non-technical stakeholders.
Key Responsibilities
- Own experimentation processes: from hypothesis generation and metric design to experiment execution (A/B, multivariate, sequential testing) and actionable insights.
- Develop and maintain evaluation frameworks for LLM-powered features, focusing on correctness, consistency, safety, hallucination detection, and bias/fairness.
- Build predictive models and heuristics to enhance AI and NLP-based product experiences.
- Collaborate with model engineers and prompt designers to explore prompt strategies, fine-tuning, model selection, and failure mode analysis.
- Automate experiment pipelines: dashboards, instrumentation, monitoring, and alerting to ensure data integrity and rapid feedback loops.
- Apply causal inference techniques and observational study methods when randomized experiments are infeasible.
- Translate insights into product recommendations and influence decision-making across the product lifecycle.
- Lead data initiatives in fast-paced, startup-like environments, with a strong focus on iteration speed, accuracy, and scalability.
- Contribute to defining experimentation strategies at scale, supporting a culture of evidence-based product development.
- Mentor junior team members and help shape best practices in experimentation and AI evaluation.
Requirements
Must-Have
- 8–12+ years of experience in Data Science or Machine Learning, with a strong focus on experiment design and product analytics.
- Demonstrated ability to drive experimentation in both startup and scaled enterprise environments.
- Experience leading cross-functional teams, setting strategies, and executing roadmaps.
- Proficiency in statistical analysis, causal inference, and robust metric design.
- Deep experience in LLMs / NLP / AI, including working with prompts, model behavior, and evaluation.
- Strong programming skills in Python, solid SQL, and familiarity with building and deploying analytic or ML pipelines.
- Excellent communication skills, with the ability to translate complex data findings into business or product outcomes.
Nice-to-Have
- Experience with fine-tuning LLMs, using multiple model providers or APIs.
- Hands-on experience with experiment platforms or internal tooling for model evaluation.
- Familiarity with voice, ASR, or other multi-modal AI applications.
Working Terms
- Must be available to work during US business hours, specifically until 6 p.m. ET (Eastern Time).
- Candidates should have their own remote work setup, including necessary equipment and internet access.