In this role as a Principal Data Scientist, you will:
- Collaborate with product leadership to identity, elaborate and prioritize projects.
- Lead a team of data scientists and developers to deliver AI-enabled microservices in collaboration with content and product engineering teams.
- Define reference architectures suitable to answer complex questions, including via code interpreting, LLM tool use and leveraging secondary data science models.
- Coach and train a team of data scientists and developers to use these architectures.
- Stay up-to-date, constantly learning about advances in the field, and deliver periodic presentations to internal teams on these developments.
- Serve as the company-wide expert in one or more complex technical areas (e.g., entity mastering, knowledge graphs, search optimization, RAG)
Requirements:
- Experience developing AI / ML applications and delivering data driven solutions
- Graduate degree in Computer Science, Engineering, Statistics or a related quantitative discipline, or equivalent work experience
- Substantial depth and breadth in NLP, Deep Learning, Generative AI and other state of the art AI / ML techniques
- Excellent knowledge of high-level programming languages (Python, Java, or C++) and core data science libraries including Pandas, NumPy and other similar libraries
- Deep understanding of CS fundamentals, computational complexity and algorithm design
- Experience with building large-scale distributed systems in an agile environment and the ability to build quick prototypes
- Experience leading a portfolio of complex data science projects and mentoring junior team members
Preferred Qualifications:
- PhD in Computer Science with an AI / ML research focus and publications in top-tier journals and conferences
- Knowledge of the healthcare domain and experience with applying AI to healthcare data
- Experience with AWS especially in relation to ML workflows with SageMaker, serverless compute and storage such as S3 and Snowflake
- Experience with LLMs, prompt engineering, retrieval augmented generation, model fine tuning and knowledge graphs