For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
Apollo Solutions
Apollo Solutions

Staff Software Engineer - LLM Inference

Location

Remote anywhere

Salary Estimate

N/AIconOpenNewWindows

Seniority

Staff

Tech stacks

Software Development
AI
Kubernetes
+20

Permanent role
a day ago
Apply now

Senior/Staff Software Engineer - LLM Inference

Salary: €70K – €120K

Fully Remote Globally

Apollo Solutions have proudly partnered with an early stage AI start-up backed by top venture capital. They are building AI that’s predictable and production-ready. Their remote-first team ships open-source tools for novel technology with strong community adoption, and we’re well funded to move fast.

The role

Own step-function improvements in LLM inference for structured outputs. This is hands-on systems work where millisecond wins matter-cut latency, raise throughput, and drive down cost across real workloads.

What you’ll tackle

  • Push and tune inference stacks (e.g., vLLM, SGLang, TensorRT) to unlock meaningful performance gains.
  • Build single-node, multi-GPU pipelines; optimize communication with NCCL.
  • Profile kernels and memory to remove bottlenecks and variance.
  • Make structured generation fast, reliable, and easy to integrate across services and OSS.
  • Harden deployments: observability, auto-scaling, fault tolerance, safe rollouts.
  • Share learnings through docs, examples, and upstream contributions.

You’ll thrive here if you have

  • Proven experience operating or extending inference engines (vLLM/SGLang/TensorRT).
  • Distributed inference chops (multi-GPU on one host) and low-latency comms (NCCL).
  • Hands-on NVIDIA GPU knowledge (CUDA, SMs, memory hierarchy).
  • A record of measurable wins (e.g., 20%+ throughput from kernel/runtime optimizations).
  • LLM MLOps background (monitoring, scaling, resilience for inference services).
  • Strong Python; Rust curiosity or experience.
  • Comfort with Docker, Kubernetes, and Linux internals.

Why this team

  • Real frontier work: Structured generation is early-innovation is the default.
  • Remote-first: Work from anywhere. Clear writing, intentional meetings.
  • Fair package: Market-aligned comp for an early-stage startup + equity, health benefits, retirement plan (where applicable), and the hardware you need (GPUs included).

If you're interested, please apply now!

About Apollo Solutions

🔗Website
Visit company profileIconOpenNewWindows

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2025 Arc
Cookie PolicyPrivacy PolicyTerms of Service