About The Role
HiCounselor is assisting one of our clients in hiring a Lead Data Scientist – Healthcare to spearhead data science initiatives in the healthcare domain. The ideal candidate will be responsible for leading end-to-end projects, applying advanced analytics and AI to healthcare data, and delivering insights that drive strategic decision-making. This role also involves mentoring team members, collaborating with stakeholders, and ensuring data-driven solutions are effectively integrated to support business objectives and improve healthcare outcomes.
Visa Sponsorship: Not available
Key Responsibilities
- Lead end-to-end training and fine-tuning of Large Language Models (LLMs), including both open-source (e.g., Qwen, LLaMA, Mistral) and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems.
- Architect and implement GraphRAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding.
- Design, train, and optimize semantic and dense vector embeddings for document understanding, search, and retrieval.
- Develop semantic retrieval systems with advanced document segmentation and indexing strategies.
- Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training.
- Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals.
- Collaborate with cross-functional teams to translate business needs into AI-driven solutions and deploy them in production environments.
Preferred Qualifications
- PhD or Master’s degree in Computer Science, Machine Learning, or related field.
- 8+ years of experience in applied AI/ML, with a strong track record of delivering production-grade models.
- Deep expertise in:
- LLM training and fine-tuning (e.g., GPT, LLaMA, Mistral, Qwen)
- Graph-based retrieval systems (GraphRAG, knowledge graphs)
- Embedding models (e.g., BGE, E5, SimCSE)
- Semantic search and vector databases (e.g., FAISS, Weaviate, Milvus)
- Document segmentation and preprocessing (OCR, layout parsing)
- Distributed training frameworks (NCCL, Horovod, DeepSpeed)
- High-performance networking (InfiniBand, RDMA)
- Model fusion and ensemble techniques (stacking, boosting, gating)
- Optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms)
- Symbolic AI and rule-based systems
- Meta-learning and Mixture of Experts architectures
- Reinforcement learning (e.g., RLHF, PPO, DPO)
Bonus Skills
· Experience with healthcare data and medical coding systems (e.g., CPT, CM, PCS).
· Familiarity with regulatory and compliance frameworks in AI deployment.
· Contributions to open-source AI projects or published research. And/Or ability to take research papers to poc – production.
Preferred Qualifications:
· M.S. or PhD in a computational domain
· Publication history in deep learning or statistical domain
· Experience with SQL databases and query language
· Experience in AzureML, AWS, or cluster computing architectures
· Experience with Hybrid NLP solutions that combine symbolic and machine learning approaches
· Experience with XML and XSLT
· Healthcare domain background
· IT full-stack engineering experience
Pay: A reasonable estimate of the current range is: $150,000 - $165,000. In addition, you may be eligible for a discretionary bonus for the current performance period.
Pay: $150,000.00 - $165,000.00 per year
Benefits:
Work Location: Remote (CST preferred)