We are seeking skilled LLAMA Software Engineers to join a high-impact team working on cutting-edge Large Language Model (LLM) solutions for our strategic client. You will work on the implementation, fine-tuning, deployment, and integration of client’s open-source LLaMA (Large Language Model AI) models in real-world production environments.
As part of this role, you will collaborate closely with client’s AI/ML researchers, product teams, and infrastructure engineers to develop scalable, safe, and responsible generative AI applications.
⸻
Key Responsibilities
• Design, develop, and deploy applications using LLaMA and other open-source LLM architectures.
• Fine-tune and optimize large models for specific tasks using reinforcement learning, prompt engineering, LoRA, or QLoRA.
• Collaborate with client stakeholders to integrate LLMs into products like search, recommendation, support automation, and content generation.
• Build robust APIs, pipelines, and tools to support LLM inference, performance monitoring, and scalability.
• Address bias, toxicity, latency, and cost-efficiency in LLM outputs in compliance with client’s responsible AI guidelines.
• Contribute to open-source efforts or internal model innovation, benchmarking, and performance improvements.
⸻
Required Qualifications
• Bachelor’s or Master’s degree in Computer Science, Machine Learning, Artificial Intelligence, or related field.
• 3–7 years of experience in software engineering with at least 1–2 years working on LLMs, transformers, or generative AI.
• Hands-on experience with LLaMA, GPT, PaLM, Mistral, or similar models using libraries like Hugging Face Transformers, PyTorch, DeepSpeed, Ray, or Accelerate.
• Solid knowledge of fine-tuning techniques, distributed training, and inference optimization (e.g., quantization, model pruning).
• Proficiency in Python and strong engineering practices (version control, CI/CD, unit testing).
• Familiarity with client’s ecosystem or equivalent (FAIR, PyTorch/Xformers, FBGEMM, etc.) is a plus.
⸻
Preferred Skills
• Experience deploying LLMs in production environments at scale.
• Familiarity with MLOps platforms (e.g., MLFlow, SageMaker, Weights & Biases).
• Experience with multimodal models, agent frameworks (AutoGPT, LangChain, Open Agents), or retrieval-augmented generation (RAG) pipelines.
• Background in privacy-preserving ML, RLHF, or embedding-based search.