Job Title: Senior Data Scientist – Generative AI / RAG
Location: Remote (US ONLY)
Employment Type: Contract to hire (C2C NOT AVAILABLE)
Overview:
We are hiring a Senior Data Scientist with deep expertise in Generative AI, particularly in Retrieval-Augmented Generation (RAG) and complex LLM system design. In this role, you’ll help shape and deliver intelligent, AI-powered products from the ground up, working on multiple greenfield initiatives that combine structured and unstructured data to deliver highly contextual, dynamic insights.
Key Responsibilities:
- Design, build, and iterate on data science pipelines that leverage RAG architectures for dynamic information retrieval and generation.
- Develop multi-step LLM workflows, including preprocessing, iterative prompt chains, and post-processing layers.
- Work with vector databases such as Solr or OpenSearch, optimizing for chunking, relevance, and retrieval speed.
- Apply prompt engineering to decompose user queries into structured components, enabling more accurate and targeted responses.
- Define and apply LLM evaluation metrics to ensure quality, relevance, and reduced hallucination.
- Collaborate across engineering, product, and design teams to align models with product requirements and user experience.
- Support the design and implementation of intelligent features for multiple AI-powered products, including early-stage design input.
- Communicate technical strategies, blockers, and results clearly across both technical and non-technical stakeholders.
Required Qualifications:
- 5+ years of experience in Data Science, with 2+ years focused on LLM applications or Generative AI.
- Expertise in RAG systems, including architecture, pipeline design, and data integration strategies.
- Practical experience with vector databases (e.g., Solr, OpenSearch), including tuning for semantic search and document chunking.
- Strong understanding of OpenAI models and/or other major cloud-based LLM offerings.
- Demonstrated ability in prompt engineering, particularly for multi-turn and complex queries.
- Experience designing multi-stage LLM chains that involve preprocessing, intermediate reasoning, and iterative improvement.
- Familiarity with LLM evaluation methodologies and benchmarks.
- Excellent communication skills, especially in cross-functional and collaborative environments.
Preferred Qualifications:
- Experience in early-stage or greenfield product development.
- Strong proficiency in Python and familiarity with frameworks like LangChain or similar.
- Experience working with large-scale unstructured data and designing for real-world production environments.
Sample Day-in-the-Life:
You may start your day refining the architecture for a feature that involves parsing multi-intent user queries, then shift to evaluating chunking and retrieval strategies using OpenSearch. Later, you might collaborate with product and design teams to define how a new intelligent feature should behave end-to-end, ensuring the underlying LLM chain supports that vision with robust prompt design and evaluation metrics.