Experience: 8+ yrs
Shift: 4 AM to 1 PM
Senior Data Scientist - Information Retrieval & Generative AI
Experience: 5+ years
Design and deploy state-of-the-art RAG architectures processing petabyte-scale datasets.
Build hybrid dense/sparse retrieval pipelines that serve millions of daily queries with subsecond
latency.
Your models will directly impact product strategy and drive measurable business outcomes for our
global user base.
Responsibilities:
• Drive end-to-end ML product development from research to production deployment
• Collaborate with engineering and product teams to translate business requirements
into scalable data solutions
• Mentor junior data scientists and establish best practices for model development
• Lead breakthrough research in information retrieval and generative AI
Technical Execution
• Design and optimize transformer-based architectures for information retrieval and
generation
• Implement advanced chunking strategies for semantic search and RAG applications
• Build and maintain real-time ML pipelines processing millions of documents
• Develop production-ready models with proper monitoring, versioning, and
deployment strategies
Innovation & Research
• Research and prototype cutting-edge AI techniques in search, retrieval, and natural
language processing
• Design large-scale experiments and A/B tests to validate model performance and
business impact
• Stay current with latest developments in GenAI and contribute to open-source
communities
• Present findings to executive leadership and influence strategic product decisions
Essential Skills
• Advanced Python Programming with expertise in pandas, scikit-learn,
TensorFlow/PyTorch
• SQL & Database Management for complex query optimization and data pipeline
design
• Machine Learning & Deep Learning with track record of shipping ML products to
production
• Statistics & Probability including advanced statistical modeling and hypothesis
testing
• 5-7+ years of data science experience with 2+ years in senior roles
Specialized Expertise
• Information Retrieval Systems - Search algorithms, ranking, and relevance
optimization
• Generative AI & LLMs - Prompt engineering, fine-tuning, and deployment at scale
• Content Chunking Strategies - Document processing and semantic segmentation
for RAG systems
• Vector Databases - Hands-on experience with Pinecone, Weaviate, FAISS, or
OpenSearch
• Transformer Models - Deep understanding of BERT, GPT, T5 architectures
Advanced Technical Skills
• RAG (Retrieval-Augmented Generation) implementation and optimization
• Named Entity Recognition (NER) at enterprise scale
• Cloud platforms (AWS, Azure, GCP) for ML deployment
• MLOps tools and practices (Docker, Kubernetes, model registries)