Required Data Scientist Qualifications:
- Bachelor’s Degree or foreign equivalent will also consider three years of progressive experience in the specialty in lieu of every year of education.
- At least 8 years of Information Technology experience
- At least 4 years of hands-on GenAI / Agentic AI and data science with machine learning
- Strong proficiency in Python programming.
- Experience of deploying the Gen AI applications with one of the Agent Frameworks like LangGraph, AutoGen or Crew AI.
- Experience in deploying the Gen AI stack/services provided by various platforms such as AWS, GCP, Azure, IBM Watson
- Experience in Generative AI and working with multiple Large Language Models and implementing Advanced RAG based solutions.
- Experience in processing/ingesting unstructured data from PDFs, HTML, Image files, audio to text etc.
- Experience with Hybrid Retrieval, Semantic Chunking, Metadata Filtering and Vector Search Optimization
- Experience with Advanced LLM concepts like Prompt Compression, Fine-Tuning, Caching, etc.
- Experience working with multimodal models integrating image, text, or audio into GenAI workflows.
- Experience working with Vector Databases (such as FAISS, Pinecone, Weaviate, or Azure AI Search).
- Experience in Model evaluation tools like DeepEval, FMeval, RAGAS , Bedrock model evaluation, including human-in-the-loop feedback loops.
- Experience with LLMOps practices including prompt versioning, caching, observability, cost tracking, and production deployment of LLMs.
- Strong understanding of AI governance (like GDPR, explainability), data privacy, model safety (e.g., hallucination, toxicity, bias), and enterprise-grade compliance requirements.
- Experience with data gathering, data quality, system / microservices architecture, coding best practices
- Experience with Lean / Agile development methodologies
Preferred Data Scientist Qualifications:
- 4 years of hands-on experience with more than one programming language; Python, R, Scala, Java, SQL
- Hands-on experience with CI/CD pipelines and DevOps tools like Jenkins, GitHub Actions, or Terraform.
- Proficiency in NoSQL and SQL databases (PostgreSQL, MongoDB, CosmosDB, DynamoDB).
- Deep Learning experience with CNNs, RNN, LSTMs and the latest research trends
- Experience in Python AI/ML frameworks such as TensorFlow, PyTorch, or LangChain.
- Strong understanding and experience of LLM fine-tuning, local deployment of open-source models
- Proficiency in building RESTful APIs using FastAPI, Flask, or Django.
- Experience in Model evaluation tools like DeepEval, FMeval, RAGAS , Bedrock model evaluation.
- Experience with perception (e.g. computer vision), time series data (e.g. text analysis)
- Big Data Experience strongly preferred, HDFS, Hive, Spark, Scala
- Data visualization tools such as Tableau, Query languages such as SQL, Hive
- Good applied statistics skills, such as distributions, statistical testing, regression, etc.
- Exposure to Front -End / Full-Stack Integration (React / Angular, TypeScript, REST APIs, GraphQL, Event-Driven etc.)