Senior DevOps + ML Infrastructure Engineer - FT - Europe, LATAM, NA, Asia

Location

Remote restrictions apply

See all remote locations

Hourly rate

Min. experience

5+ years

Hours per week

40 hours

Duration

24 weeks

Required skills

Freelance job

Posted 3 days ago

Apply now

Actively recruiting / 32 applicants

We’re here to help you

Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email

Juliana Torrisi, Recruiter

About the role:

We're seeking a DevOps and MLOps engineer to manage deployments, pipelines, and cloud infrastructure across AWS and GCP. This is an advanced role for someone deeply familiar with infra-as-code and deploying LLM pipelines at scale.

Core Requirements:

Infrastructure diagramming should be a core requirement
8+ years in DevOps / CloudOps, including containerized deployments
Expert in Terraform, GitHub Actions, AWS CDK, and GCP deployment workflows
Thorough knowledge of AWS CodePipelines
Infrastructure ownership of AWS (IAM, ECS/Fargate, S3, Bedrock) and GCP (Cloud Run, Batch - experience with managing batch jobs required, Firestore)
Solid experience with Docker, CI/CD pipelines, and automated testing
Prior experience deploying generative AI models (e.g., Bedrock, Hugging Face, Gemini, Claude) using services like LiteLLM
Expertise in security best practices: IAM policies, secrets management, intrusion detection
Deep cost optimization knowledge for AI workloads in production
Experience with AWS Cloudwatch and cost explore or other similar monitoring tools
Experience implementing cost-tracking and model-spend monitoring (e.g., via LiteLLM)
Familiarity with prompt management systems like AWS Bedrock Prompt Management
Ability to set and manage model-specific quotas (e.g., Gemini 2.5 token and request limits)
Capable of integrating new multimodal model endpoints (e.g., Imagen 3, Google Multimodal, grok-3-beta)

Additional Required Experience:

Fluent in switching and deploying across model providers (OpenAI, Google, Anthropic, DeepSeek)
Experience deploying secure HTTP(S) transports and server-sent events (SSE)
Capability to debug image-generation, prompt-optimization, and vector database fallbacks (e.g., switching from Weaviate to ChromaDB)

Must be fluent in:

Multimodal pipelines and data compliance (TLS, AES-256, SOC2-aligned tools)
Rapid model replacement and deprecation workflows (e.g., aliasing GPT-4o to GPT-4.1)
Monitoring and logging for file processing issues across distributed systems
Collaborating with ML and backend teams to deploy pipelines across multiple cloud environments