Job Title: Data Scientist (In-Person Interview)
Location: Remote
Duration: Long-Term
Visa: Any visa is Fine(Need 4+ years of experience in Data Scientist)
Interview process: 1_st_ round virtual and 2_nd_ round (F2F mandatory)
End Client: Working in Implementation Project
Note: Need at least 4+ years of experience complete into Data Scientist.
Note: while sharing the resume please share writeup for data scientist for your experience.
Please share resumes at Akhil@tror.ai
Key Responsibilities
- Lead end-to-end training and fine-tuning of Large Language Models (LLMs), including both open-source (e.g., Qwen, Llama, Mistral) and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems.
- Architect and implement Graph RAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding.
- Design, train, and optimize semantic and dense vector embeddings for document understanding, search, and retrieval.
- Develop semantic retrieval systems with advanced document segmentation and indexing strategies.
- Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training.
- Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals.
- Collaborate with cross-functional teams to translate business needs into AI-driven solutions and deploy them in production environments.
- Designing and implementing analytical frameworks.
- Developing predictive and prescriptive models.
Preferred Qualifications
- PhD or master’s degree in computer science, Machine Learning, or related field.
- 4+ years of experience in applied AI/ML, with a strong track record of delivering production-grade models.
Deep Expertise In
- LLM training and fine-tuning (e.g., GPT, Llama, Mistral, Qwen)
- Graph-based retrieval systems (Graph RAG, knowledge graphs)
- Embedding models (e.g., BGE, E5, SimCSE)
- Semantic search and vector databases (e.g., FAISS, Weaviate, Milvus)
- Document segmentation and preprocessing (OCR, layout parsing)
- Distributed training frameworks (NCCL, Horovod, DeepSpeed)
- High-performance networking (InfiniBand, RDMA)
- Model fusion and ensemble techniques (stacking, boosting, gating)
- Optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms)
- Symbolic AI and rule-based systems
- Meta-learning and Mixture of Experts architectures
- Reinforcement learning (e.g., RLHF, PPO, DPO)
- Experience applying causal inference techniques (e.g., causal impact analysis, uplift modeling, DoWhy) to marketing and engagement analytics.
- Exercise independent judgment in methods, techniques, and evaluation criteria on data science projects, overseeing the end-to-end process from problem definition to model implementation.
- Proficiency with programming languages like Python, R, and SQL.
- Strong background in predictive modeling, classification, segmentation, and optimization.
- Extensively worked in any Cloud environment.
Bonus Skills
- Familiarity with regulatory and compliance frameworks in AI deployment.
- Contributions to open-source AI projects or published research. And/Or ability to take research papers to poc – production.