Sole is in direct contact with the company and can answer any questions you may have. Email
Tech Lead — Software Engineer (AI Infrastructure & Model Serving)
Full-Time · Remote or Hybrid
We are building a high-performance AI platform which is fast inference, scalable model serving, evals, routing, and developer-friendly APIs.
Our mission is to provide the fastest, most reliable, and most cost-efficient LLM infrastructure for developers and enterprises.
We are looking for a Tech Lead Software Engineer with deep engineering instincts who can architect, build, and scale our core AI infra from the ground up.
This role is ideal for someone who has experience in LLM inference, GPU systems, distributed compute, low-latency APIs, or high-scale backend engineering.
As our Tech Lead, you will own the architecture, implementation, and evolution of our core platform. You will lead engineering decisions, work closely with founders on product direction, and build a team around you as we scale.
You will work across:
● High-performance model inference
● GPU/accelerator orchestration
● Distributed serving systems
● API gateway + developer platform
● Evals, model routing, logging, observability
● Reliability, scaling, and infra automation
Must-Have
● 5+ years in backend, infra, or systems engineering
● Strong experience with:
**Nice-to-Have **
● Experience with:
● Build a vLLM-like inference engine with custom optimizations
● Design a dynamic batching service for 100k+ token/sec throughput
● Build the routing layer that selects models based on latency/cost constraints
● Implement streaming WebSocket APIs for high-speed generation
● Optimize GPU clusters for maximum throughput per dollar
● Build tooling for evals, performance dashboards, and observability
● Architect multi-model hosting across heterogeneous GPU pools
● Build the core technical engine of an AI infra company
● Massive ownership and autonomy
● Work directly with founders
● Fast iteration and real product impact
● Competitive salary + meaningful early equity
● Opportunity to build and lead an engineering team
Share your resume, GitHub, or examples of relevant work.
If you have experience with LLM inference, GPU optimization, distributed systems, or building developer-first APIs, we strongly encourage you to apply.