For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
Arc Exclusive
Arc Exclusive

Senior DevOps Engineer – Generative AI & Cloud Infrastructure - NA/US/UK

Location

Remote restrictions apply
See all remote locations

Salary

US$140K - 180K

Min. experience

5+ years

Required skills

KubernetesTerraformAWSLoad BalancingNetwork

Full-time role
Posted a month ago
Apply now
Actively recruiting / 110 applicants

We’re here to help you

Sole is in direct contact with the company and can answer any questions you may have. Email

SoleSole, Recruiter

Senior DevOps Engineer – Generative AI & Cloud Infrastructure

Full-Time · Remote or Hybrid · High-Impact Role

About Us

We are building a next-generation AI cloud platform combining fast LLM inference with high-performance cloud infrastructure, GPU clusters, and developer-first APIs. Our systems power mission-critical generative AI workloads across distributed data centers and cutting-edge ML hardware. We’re looking for a Senior DevOps Engineer to own and evolve the infrastructure backbone of our platform. You’ll work closely with infra, ML, and product engineering teams to design, automate, and operate reliable, scalable, and observable systems for AI workloads. If you love working at the intersection of DevOps, distributed systems, GPUs, and generative AI, this role is for you.

What You’ll Do

Design & Operate AI Cloud Infrastructure

  • Build and maintain scalable, secure, and highly-available infrastructure for LLM inference, fine-tuning, and data processing.
  • Manage multi-region Kubernetes clusters running GPU-heavy workloads.
  • Implement and refine autoscaling strategies for heterogeneous GPU fleets.

Infrastructure as Code & Automation

  • Own infrastructure-as-code deployments with tools like Terraform, Helm, and Ansible.
  • Automate provisioning of compute, networking, and storage for AI clusters.
  • Build pipelines to spin up and tear down clusters for experiments, benchmarks, and customer environments.

CI/CD & Release Engineering

  • Design and maintain CI/CD pipelines for backend, ML, and infra components.
  • Implement safe rollout strategies (blue/green, canary, feature flags).
  • Collaborate with engineers to improve build times, test reliability, and deployment velocity.

Observability, Reliability & SRE

  • Build and operate observability stacks (Prometheus, Grafana, Loki/ELK, OpenTelemetry).
  • Define and monitor SLOs/SLAs for latency, availability, and error budgets across services.
  • Implement playbooks, runbooks, and incident response processes for production systems.

Security, Compliance & Best Practices

  • Implement best practices for secrets management, access control, and network security.
  • Help design secure multi-tenant environments for enterprise customers.
  • Partner with leadership to build a culture of reliability, ownership, and operational excellence.

What We’re Looking For...

Must-Have

  • 4–8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
  • Strong experience operating production systems on AWS / GCP / Azure.
  • Deep experience with Kubernetes in production (cluster management, Helm, operators, networking, storage).
  • Proficiency with infrastructure-as-code (Terraform or similar).
  • Strong skills in at least one scripting/programming language (Python, Go, Bash, etc.).
  • Solid understanding of networking, load balancers, DNS, TLS, and security fundamentals.
  • Proven track record of building reliable, observable, and automated systems.

Nice-to-Have

  • Experience with GPU-based workloads and ML infrastructure (H100s, A100s, GB200s, etc.).
  • Familiarity with LLM inference stacks, ML training pipelines, or data platforms.
  • Experience with: Service meshes and API gateways; GitHub Actions, ArgoCD, or similar CI/CD tools; Prometheus, Grafana, Loki, Tempo, OpenTelemetry
  • Exposure to high-throughput storage systems, object stores, or distributed filesystems.
  • Prior experience in an AI infra, cloud platform, or high-scale SaaS startup.

Who You Are

  • You think in systems and love reducing complexity with automation.
  • You’re calm under pressure and comfortable owning production systems.
  • You enjoy partnering closely with engineers and aren’t afraid to dive into code.
  • You care about reliability, performance, and craftsmanship.
  • You thrive in fast-moving, zero-to-one startup environments.

Why Join Us

  • Work on the core infrastructure powering cutting-edge generative AI. Collaborate with world-class infra, ML, and product engineers.
  • High ownership over architecture, tooling, and operational practices.
  • Competitive compensation, equity, and strong growth potential.
  • Flexible remote/hybrid environment.

How to Apply
Please reply to the application questions.

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2026 Arc
Cookie PolicyPrivacy PolicyTerms of Service