About The Role
The role is designed for a software engineer who has moved beyond basic prompt engineering and understands how to architect, deploy, and scale production-grade generative AI systems. The team focuses on building robust Retrieval-Augmented Generation (RAG) pipelines, multi-agent orchestrations, and fine-tuning workflows that deliver deterministic, reliable outputs in high-throughput environments.
Working alongside backend engineers and data scientists, this role will own the integration of state-of-the-art LLMs into core product workflows, directly tackling the challenges of model latency, cost optimization, evaluation metrics, and hallucination mitigation.
Key Responsibilities
- Design, implement, and optimize production-grade Retrieval-Augmented Generation (RAG) pipelines using LangChain, LlamaIndex, or native integrations
- Build and maintain scalable vector search infrastructures using systems like Pinecone, Milvus, Qdrant, or pgvector, focusing on high-recall indexing and low-latency retrieval
- Implement systematic LLM evaluation and observability frameworks utilizing tools like Phoenix, LangSmith, or custom LLM-as-a-judge pipelines to track drift and accuracy
- Deploy, serve, and optimize open-source LLMs (Llama, Mistral) using frameworks like vLLM, TGI, or TensorRT-LLM on AWS or GCP
- Execute parameter-efficient fine-tuning (PEFT, LoRA, QLoRA) on domain-specific datasets to adapt models for specialized enterprise tasks
- Collaborate with backend engineering teams to expose robust, asynchronous APIs serving real-time model outputs with built-in rate-limiting and fallback mechanisms
What We Are Looking For
- 3-6 years of professional software engineering experience, with at least 1.5 years of hands-on experience building and deploying generative AI systems in production
- Advanced proficiency in Python, including experience with asynchronous programming (asyncio), FastAPI, and writing highly-optimized, testable code
- Deep technical understanding of transformer architectures, attention mechanisms, embedding spaces, and context-window management
- Practical experience setting up and tuning vector databases and managing semantic search pipelines at scale
- BS or MS in Computer Science, Data Science, Mathematics, or a closely related technical field
- Bonus: Experience with advanced prompting methodologies, agentic frameworks (CrewAI, AutoGen), or containerization and orchestration via Docker and Kubernetes