LLM / GenAI Engineer

Location

Remote restrictions apply

See all remote locations

Salary Estimate

N/A

Seniority

N/A

Tech stacks

Software Development

Data

+17

Permanent role

a day ago

Apply now

About The Role

The role is designed for a software engineer who has moved beyond basic prompt engineering and understands how to architect, deploy, and scale production-grade generative AI systems. The team focuses on building robust Retrieval-Augmented Generation (RAG) pipelines, multi-agent orchestrations, and fine-tuning workflows that deliver deterministic, reliable outputs in high-throughput environments.

Working alongside backend engineers and data scientists, this role will own the integration of state-of-the-art LLMs into core product workflows, directly tackling the challenges of model latency, cost optimization, evaluation metrics, and hallucination mitigation.

Key Responsibilities

Design, implement, and optimize production-grade Retrieval-Augmented Generation (RAG) pipelines using LangChain, LlamaIndex, or native integrations
Build and maintain scalable vector search infrastructures using systems like Pinecone, Milvus, Qdrant, or pgvector, focusing on high-recall indexing and low-latency retrieval
Implement systematic LLM evaluation and observability frameworks utilizing tools like Phoenix, LangSmith, or custom LLM-as-a-judge pipelines to track drift and accuracy
Deploy, serve, and optimize open-source LLMs (Llama, Mistral) using frameworks like vLLM, TGI, or TensorRT-LLM on AWS or GCP
Execute parameter-efficient fine-tuning (PEFT, LoRA, QLoRA) on domain-specific datasets to adapt models for specialized enterprise tasks
Collaborate with backend engineering teams to expose robust, asynchronous APIs serving real-time model outputs with built-in rate-limiting and fallback mechanisms

What We Are Looking For

3-6 years of professional software engineering experience, with at least 1.5 years of hands-on experience building and deploying generative AI systems in production
Advanced proficiency in Python, including experience with asynchronous programming (asyncio), FastAPI, and writing highly-optimized, testable code
Deep technical understanding of transformer architectures, attention mechanisms, embedding spaces, and context-window management
Practical experience setting up and tuning vector databases and managing semantic search pipelines at scale
BS or MS in Computer Science, Data Science, Mathematics, or a closely related technical field
Bonus: Experience with advanced prompting methodologies, agentic frameworks (CrewAI, AutoGen), or containerization and orchestration via Docker and Kubernetes

About Scale.jobs

🔗Website

Visit company profile

Unlock all Arc benefits!

Browse remote jobs in one place
Land interviews more quickly
Get hands-on recruiter support

PRODUCTS

Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS

About us Pricing Arc Careers - Hiring Now!Remote Junior Jobs Remote jobs Career Success Stories Talent Career Blog Arc Newsletter

JOBS BY EXPERTISE

Remote Front End Developer Jobs Remote Back End Developer Jobs Remote Full Stack Developer Jobs Remote Mobile Developer Jobs Remote Data Scientist Jobs Remote Game Developer Jobs Remote Data Engineer Jobs Remote Programming Jobs Remote Design Jobs Remote Marketing Jobs Remote Product Manager Jobs Remote Project Manager Jobs Remote Administrative Support Jobs

JOBS BY TECH STACKS

Remote AWS Developer Jobs Remote Java Developer Jobs Remote Javascript Developer Jobs Remote Python Developer Jobs Remote React Developer Jobs Remote Shopify Developer Jobs Remote SQL Developer Jobs Remote Unity Developer Jobs Remote Wordpress Developer Jobs Remote Web Development Jobs Remote Motion Graphic Jobs Remote SEO Jobs Remote AI Jobs

Cookie Policy Privacy Policy Terms of Service