Senior/Staff Software Engineer - LLM Inference
Salary: €70K – €120K
Fully Remote Globally
Apollo Solutions have proudly partnered with an early stage AI start-up backed by top venture capital. They are building AI that’s predictable and production-ready. Their remote-first team ships open-source tools for novel technology with strong community adoption, and we’re well funded to move fast.
The role
Own step-function improvements in LLM inference for structured outputs. This is hands-on systems work where millisecond wins matter-cut latency, raise throughput, and drive down cost across real workloads.
What you’ll tackle
- Push and tune inference stacks (e.g., vLLM, SGLang, TensorRT) to unlock meaningful performance gains.
- Build single-node, multi-GPU pipelines; optimize communication with NCCL.
- Profile kernels and memory to remove bottlenecks and variance.
- Make structured generation fast, reliable, and easy to integrate across services and OSS.
- Harden deployments: observability, auto-scaling, fault tolerance, safe rollouts.
- Share learnings through docs, examples, and upstream contributions.
You’ll thrive here if you have
- Proven experience operating or extending inference engines (vLLM/SGLang/TensorRT).
- Distributed inference chops (multi-GPU on one host) and low-latency comms (NCCL).
- Hands-on NVIDIA GPU knowledge (CUDA, SMs, memory hierarchy).
- A record of measurable wins (e.g., 20%+ throughput from kernel/runtime optimizations).
- LLM MLOps background (monitoring, scaling, resilience for inference services).
- Strong Python; Rust curiosity or experience.
- Comfort with Docker, Kubernetes, and Linux internals.
Why this team
- Real frontier work: Structured generation is early-innovation is the default.
- Remote-first: Work from anywhere. Clear writing, intentional meetings.
- Fair package: Market-aligned comp for an early-stage startup + equity, health benefits, retirement plan (where applicable), and the hardware you need (GPUs included).
If you're interested, please apply now!