JOB DESCRIPTION
Senior Data Scientist
Speech, Voice & Conversational AI
Apply Now
Department:
Data Science & AI
Experience:
12 – 15 Years
Role Overview
We are seeking a highly experienced Senior Data Scientist – Speech, Voice & Conversational AI to lead the architecture, design, and delivery of next-generation voice and speech AI solutions. This role sits at the intersection of deep machine learning expertise and practical product engineering, driving end-to-end voice AI capabilities across Firstsource’s service lines.
The ideal candidate brings 12–15 years of progressive experience in data science with a strong specialization in speech and voice technologies, along with hands-on expertise in Generative AI, Agentic AI frameworks, and modern voice pipeline tooling. You will act as a technical thought leader, shaping our voice AI strategy, mentoring teams, and collaborating with cross-functional stakeholders to deliver production-grade solutions at scale.
Key Responsibilities
Voice & Speech AI Architecture
- Design and own the end-to-end architecture for voice AI solutions including real-time speech-to-text (STT), text-to-speech (TTS), voice-to-voice, speaker diarization, emotion detection, and voice biometrics.
- Evaluate, benchmark, and integrate leading speech platforms and APIs such as Google Cloud Speech, Amazon Transcribe, Azure Speech Services, Whisper (OpenAI), Deepgram, AssemblyAI, ElevenLabs, and PlayHT.
- Build robust voice pipelines that handle noise cancellation, language identification, accent adaptation, and real-time streaming at production scale.
Generative AI & Agentic AI
- Architect and deploy GenAI-powered conversational agents leveraging Large Language Models (LLMs) such as GPT-4, Claude, Gemini, and open-source alternatives (LLaMA, Mistral).
- Design Agentic AI workflows using frameworks such as LangChain, LangGraph, CrewAI, AutoGen, and Semantic Kernel to build multi-step, tool-using voice agents.
- Implement Retrieval-Augmented Generation (RAG) pipelines with vector databases (Pinecone, Weaviate, Qdrant, Chroma) for context-aware voice assistants.
- Drive prompt engineering strategies and fine-tuning approaches (LoRA, QLoRA, RLHF) to optimize LLM performance for speech-centric use cases.
Solution Design & Delivery
- Lead solution design workshops with clients and internal stakeholders to translate business requirements into scalable voice AI architectures.
- Define technical roadmaps, establish best practices, and create reusable solution accelerators for voice and conversational AI.
- Own proof-of-concept (POC) development through to production deployment, working closely with MLOps and engineering teams.
Leadership & Mentoring
- Mentor and upskill a team of data scientists and ML engineers on speech AI and GenAI best practices.
- Represent Firstsource as a subject-matter expert in voice AI at internal reviews, client presentations, and industry forums.
- Stay current on rapidly evolving GenAI, speech, and agentic AI research and translate insights into actionable opportunities.
Technical Skills & Tooling
Domain
Required Proficiency
- Speech-to-Text (STT)
- Whisper, Google Cloud Speech, Azure Speech, Amazon Transcribe, Deepgram, AssemblyAI, Kaldi
- Text-to-Speech (TTS)
- ElevenLabs, PlayHT, Azure Neural TTS, Amazon Polly, Google WaveNet, Tortoise TTS, Bark
- Voice-to-Voice
- Real-time duplex pipelines, WebRTC integration, voice cloning, prosody transfer, streaming architectures
- LLM & GenAI
- GPT-4/4o, Claude, Gemini, LLaMA, Mistral, fine-tuning (LoRA/QLoRA), RLHF, prompt engineering
- Agentic AI Frameworks
- LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, function calling, tool-use patterns
- RAG & Vector DBs
- Pinecone, Weaviate, Qdrant, Chroma, FAISS, embedding models, hybrid search
- ML / Deep Learning
- PyTorch, TensorFlow, Transformers (HuggingFace), audio feature engineering (MFCCs, spectrograms)
- Cloud & MLOps
- AWS / Azure / GCP, Docker, Kubernetes, MLflow, model serving (Triton, TorchServe, vLLM)
- Programming
- Python (advanced), SQL, familiarity with Rust/C++ for performance-critical audio processing
- Telephony & Contact Center
- Twilio, Genesys, Amazon Connect, SIP/VoIP protocols, CCAI (Google Contact Center AI)
Qualifications & Experience
- 12–15 years of progressive experience in Data Science, Machine Learning, or AI Engineering, with at least 5 years focused on speech, voice, or audio ML.
- Master’s or Ph.D. in Computer Science, Electrical Engineering, Computational Linguistics, or a related quantitative discipline.
- Demonstrated track record of architecting and deploying production-grade speech/voice AI systems at scale.
- Deep hands-on expertise with at least two major cloud speech platforms (Google, Azure, AWS) and open-source speech models.
- Strong understanding of Generative AI fundamentals including transformer architectures, attention mechanisms, tokenization, and inference optimization.
- Proven experience building Agentic AI solutions with multi-step reasoning, tool use, and autonomous decision-making.
- Published research, patents, or conference presentations in speech/NLP is a strong plus.
Preferred Qualifications
- Experience in BPO, contact center, or customer experience transformation using voice AI.
- Familiarity with speech analytics, call quality monitoring, and agent assist technologies.
- Hands-on experience with voice cloning, voice conversion, and neural codec models (e.g., SoundStream, EnCodec).
- Contributions to open-source speech or GenAI projects.
- Experience with real-time, low-latency voice-to-voice systems for production telephony.
Why Firstsource?
At Firstsource, we make it happen. You will join a team that is deeply committed to using AI and intelligent automation to transform customer experience at global scale. This role offers the opportunity to shape the future of voice AI in one of the world’s leading business process companies, with access to real-world data, enterprise clients, and a culture that rewards bold thinking and rapid experimentation.
Firstsource is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.