Backend Engineer - Perm - US/ UK/ Western Europe

Location

Remote restrictions apply

See all remote locations

Salary

US$100K - 180K

Min. experience

5+ years

Required skills

PythonRESTful APIGraphQLKubernetesDockerAWS SQL

Full-time role

Posted a month ago

Apply now

Actively recruiting / 325 applicants

We’re here to help you

Sole is in direct contact with the company and can answer any questions you may have. Email

Sole, Recruiter

Backend Engineer (APIs, Load Balancer, Scheduler)
Full-Time · Remote or Hybrid · High-Impact Role

About Odyn

Odyn is building transformative AI solutions powered by cutting-edge, high-performance
infrastructure. We're seeking a Backend Engineer to architect high-performance APIs, load balancers,
and workload schedulers for our GPU infrastructure platform.

What You'll Do

Build Core Backend Systems & Load Balancing
● Architect RESTful and GraphQL APIs for model inference, fine-tuning, and resource
management.
● Build API gateways handling authentication, rate limiting, routing, and multi-tenant isolation.
● Develop intelligent load balancers distributing inference requests across GPU clusters based
on latency, cost, and availability with health checking and circuit breakers.
● Implement token streaming, batching, and request queuing for LLM inference workloads.
Design Workload Schedulers & Resource Orchestration
● Design GPU resource allocation schedulers with bin-packing algorithms optimizing for
topology (NVLink, PCIe, NUMA), workload type, and cost.
● Implement preemption, checkpointing, and graceful eviction for multi-tenant environments.
● Integrate with Kubernetes schedulers, custom operators, or standalone orchestration
systems.
Ensure Reliability, Performance & Collaboration
● Build observability systems (Prometheus, Grafana, OpenTelemetry) tracking API
performance, load balancer health, and scheduler efficiency.
● Define and monitor SLOs for latency, throughput, and availability.
● Partner with infrastructure, ML, and product teams to optimize systems and shape developer-
facing features.
● Participate in on-call rotation supporting production systems.

What We're Looking For

Must-Have

● 4–7+ years in backend/distributed systems or infrastructure engineering.
● Strong programming in Python, Go, or Rust; deep API development experience
(REST/GraphQL).
● Proven distributed systems expertise: load balancing, service discovery, failover,
microservices, message queues.
● Production cloud experience (AWS/GCP/Azure); Kubernetes and Docker proficiency.
● Strong networking fundamentals (TCP/IP, DNS, TLS, HTTP); SQL/NoSQL database skills.

Nice-to-Have

● GPU-accelerated workloads or HPC scheduling experience.
● LLM inference frameworks (vLLM, TensorRT-LLM) or Kubernetes scheduling internals.
● API gateways (Kong, NGINX) or service meshes (Istio); infrastructure-as-code (Terraform,
Helm).
● Observability tools (Jaeger, Datadog); high-throughput systems (100k+ req/sec); AI
infrastructure startup experience.

Why Join Us

● Work at the frontier of AI infrastructure building systems powering next-generation AI
applications.
● Own critical components from the ground up.
● Collaborate with world-class teams.
● Competitive compensation + remote flexibility.

If you have experience building high-performance APIs, distributed schedulers, or load balancers for
cloud infrastructure, we strongly encourage you to apply.