We are seeking skilled LLAMA Software Engineers to join a high-impact team working on cutting-edge Large Language Model (LLM) solutions for our strategic client. You will work on the implementation, fine-tuning, deployment, and integration of client’s open-source LLaMA (Large Language Model AI) models in real-world production environments.

As part of this role, you will collaborate closely with client’s AI/ML researchers, product teams, and infrastructure engineers to develop scalable, safe, and responsible generative AI applications.

⸻

Key Responsibilities

• Design, develop, and deploy applications using LLaMA and other open-source LLM architectures.

• Fine-tune and optimize large models for specific tasks using reinforcement learning, prompt engineering, LoRA, or QLoRA.

• Collaborate with client stakeholders to integrate LLMs into products like search, recommendation, support automation, and content generation.

• Build robust APIs, pipelines, and tools to support LLM inference, performance monitoring, and scalability.

• Address bias, toxicity, latency, and cost-efficiency in LLM outputs in compliance with client’s responsible AI guidelines.

• Contribute to open-source efforts or internal model innovation, benchmarking, and performance improvements.

⸻

Required Qualifications

• Bachelor’s or Master’s degree in Computer Science, Machine Learning, Artificial Intelligence, or related field.

• 3–7 years of experience in software engineering with at least 1–2 years working on LLMs, transformers, or generative AI.

• Hands-on experience with LLaMA, GPT, PaLM, Mistral, or similar models using libraries like Hugging Face Transformers, PyTorch, DeepSpeed, Ray, or Accelerate.

• Solid knowledge of fine-tuning techniques, distributed training, and inference optimization (e.g., quantization, model pruning).

• Proficiency in Python and strong engineering practices (version control, CI/CD, unit testing).

• Familiarity with client’s ecosystem or equivalent (FAIR, PyTorch/Xformers, FBGEMM, etc.) is a plus.

⸻

Preferred Skills

• Experience deploying LLMs in production environments at scale.

• Familiarity with MLOps platforms (e.g., MLFlow, SageMaker, Weights & Biases).

• Experience with multimodal models, agent frameworks (AutoGPT, LangChain, Open Agents), or retrieval-augmented generation (RAG) pipelines.

• Background in privacy-preserving ML, RLHF, or embedding-based search.