For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
Arc Exclusive
Arc Exclusive

US - SLURM and K8S Infrastructure and Systems Engineer

Location

Remote restrictions apply
See all remote locations

Salary

US$90K - 130K

Min. experience

5+ years

Required skills

GolangHPCHpc job schedulersKubernetes

Full-time role
Posted 2 days ago
Apply now
Actively recruiting / 10 applicants

At Cedana, we are solving a problem many thought impossible: the seamless, live migration of active CPU and GPU containers.

We're building the next generation of cloud infrastructure, founded on our pioneering work in checkpoint/restore technology. This isn't just an incremental improvement; it's a fundamental shift that makes distributed computation truly portable, elastic, and resilient across the entire stack.

We are backed by visionary investors, including a co-founder of OpenAI, the former Chief Architect of Slack, and founding members of Facebook AI, who all see the transformative potential of our mission. To achieve this vision, we’re looking for brilliant systems engineers—the kind who are obsessed with understanding how computing works from the silicon up.We’re looking for systems engineers who live deep in the container stack and understand Kubernetes beyond just the surface.

Systems Engineer, Distributed Compute Infrastructure

At Cedana, we are solving a problem many thought impossible: the seamless, live migration of active CPU and GPU containers. We're building the next generation of cloud infrastructure, founded on our pioneering work in checkpoint/restore technology. This isn't just an incremental improvement; it's a fundamental shift that makes distributed computation truly portable, elastic, and resilient across the entire stack.

If you thrive on solving deep, complex problems in uncharted territory, we invite you to join us.

What You Will Do

As a core member of our engineering team, you will build and fortify the "magic" that powers our platform. You will operate across the entire compute stack, from the Linux kernel to our managed Kubernetes offering, to deliver a product that is both powerful and exceptionally reliable.

  • Design and Build: Architect and implement core components of our system, leveraging our unique insights into checkpointing, virtualization, and container orchestration to create capabilities that don't exist anywhere else.
  • Engineer Rock-Solid Reliability: Enhance the stability and performance of our entire system, from kernel-level interactions and hypervisor optimizations to our managed Kubernetes cloud platform.
  • Partner with Customers: Work directly with customers to solve their most complex infrastructure challenges, acting as a trusted technical partner and gathering insights that drive our product roadmap.
  • Develop Sophisticated Tooling: Build and refine our internal observability and alerting infrastructure to proactively identify and resolve issues anywhere in the stack, ensuring our systems meet the highest standards of performance and availability.

Who You Are

You aren't a traditional full-stack developer. You are driven by a deep curiosity to understand every layer of the technology you work with. You have a track record of solving challenging problems in complex systems and a passion for building robust, high-performance infrastructure.

  • A Systems Thinker: You have the intellectual bandwidth and desire to learn the full compute stack, from hardware and device drivers to the OS kernel, container runtimes, and distributed systems.
  • A Creative Problem-Solver: You possess a history of tackling difficult technical challenges, perhaps in compilers, distributed systems, embedded systems, or highly available platforms.
  • A Proven Collaborator: You have a demonstrated ability to work effectively with a team of high-caliber engineers to achieve ambitious goals.

Required Experience

  • Deep Kubernetes Expertise: You have a strong command of Kubernetes internals, including controllers, operators, CRDs, the API machinery, and scheduling. You have experience writing Kubernetes controllers or services from scratch.
  • Linux & Container Internals: You possess a fundamental understanding of Linux/UNIX (system libraries, services, networking, kernel/user-space interaction) and containerization tech (containerd/cri-o, runc, cgroups, namespaces, seccomp).
  • Understanding of Networking: You understand how packets flow in Kubernetes, and have hacked around or deployed tooling like CNI, Cilium and/or Istio.
  • Production Experience: You have hands-on experience scaling infrastructure, managing production-level Kubernetes clusters, and working with infrastructure-as-code tools like Helm and Terraform.
  • Low-Level Familiarity: You are comfortable with concepts in low-level systems programming.
  • On-Call Ready: You understand the importance of reliability and are familiar with being on-call. (Our founders have extensive on-call experience and are committed to building a sane, sustainable rotation).

Bonus Points If You Have

  • Contributed to open-source projects like Kubernetes, containerd, or the Linux kernel.
  • Experience with running multi-node and multi-cluster GPU workloads (training/inference) at scale.
  • Experience with virtualization in Kubernetes, like KubeVirt or kata.
  • Familiarity with HPC environments (SLURM, MPI, RDMA) or GPU-centric Kubernetes tooling (Kueue, KubeFlow, KServe).
  • A passion for debugging weird kernel panics just as much as you enjoy writing elegant Go or Rust code.
  • Experience leading teams or mentoring other engineers in a remote environment.
  • Have written your own container runtime!

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2025 Arc
Cookie PolicyPrivacy PolicyTerms of Service