Data Science Lead (NLP & GenAI)
Summary
We are seeking a highly experienced and innovative Data Science Lead with 8+ years of expertise in core data science concepts and around 2+ years of focused, hands-on experience in Natural Language Processing (NLP) and Generative AI (GenAI). You will lead strategic AI/ML initiatives, mentor junior data scientists, and deliver intelligent solutions that drive business value using both classical and modern machine learning techniques.
Key Responsibilities
- Lead end-to-end design and delivery of data science solutions, from problem definition to deployment.
- Design, build, and fine-tune NLP and GenAI models for tasks such as summarization, classification, question answering, translation, and chatbot applications.
- Apply statistical modeling, predictive analytics, and machine learning algorithms on structured and unstructured datasets.
- Collaborate with product, engineering, and business teams to translate high-level business problems into data science solutions.
- Ensure scalability, reproducibility, and performance optimization in all machine learning workflows.
- Work with large-scale data processing tools and frameworks in cloud-based environments.
- Mentor and review work of junior data scientists and collaborate on research and experimentation.
- Track advancements in GenAI, LLMs, and NLP frameworks and bring innovation to enterprise AI use cases.
Mandatory Skills
- Python: Strong proficiency in Python for data science, modeling, and scripting
- Machine Learning: Hands-on with classical and ensemble models (e.g., Random Forest, XGBoost)
- NLP (2+ years): Experience with transformers, tokenization, embeddings, sentiment analysis
- GenAI & LLMs: Working with GPT-like models, fine-tuning, prompt engineering
- Deep Learning (PyTorch / TensorFlow): Building and training deep learning models for NLP and other domains
- Model Deployment: Deploying models via REST APIs, Docker, or cloud-native services
- SQL & Data Manipulation: Strong ability to query, clean, and process data
- Statistical Analysis: Applied statistics, hypothesis testing, and A/B testing
- Version Control (Git): Experience using Git in collaborative environments
Optional/nice-to-have Skills
- Vector Databases: Experience with FAISS, Pinecone, or ChromaDB for semantic search
- RAG Architecture: Building Retrieval-Augmented Generation pipelines
- LLM Orchestration: LangChain, LlamaIndex, or similar frameworks
- Cloud Platforms (Azure/GCP/AWS): Cloud-based ML workflows, pipelines, and infrastructure
- MLOps: Model tracking, monitoring, CI/CD with MLflow, Kubeflow, etc.
- Big Data Tools: Spark, Databricks, or Hadoop ecosystem familiarity
- Experiment Tracking: Tools like Weights & Biases, MLflow
- Academic Research / Publications: Experience publishing whitepapers or research contributions
- Hand-on experience with Databricks, preferably Azure Databricks platform.
- Hand-on experience with Delta Lake, preferably Azure Databricks and ADLS Gen2 platforms.
Educational Qualifications
Master’s or PhD in Computer Science, Data Science, AI/ML, Statistics, or a related field.
Certifications (preferred But Not Mandatory)
- Google Cloud or Azure AI Engineer / Data Scientist Associate
- Databricks Certified Machine Learning Professional
- DeepLearning.AI Generative AI certification
- Hugging Face Transformers certification