Sole is in direct contact with the company and can answer any questions you may have. Email
Tech Lead Architect - AI Infrastructure & Distributed Systems
Full-Time · Remote or Hybrid · Founding Team
We are building next-generation AI infrastructure of ultra-fast model inference, scalable LLM hosting, evals, model routing, observability, and developer-friendly APIs.
We are looking for a Tech Lead Architect with deep experience in distributed systems, ML serving, high-performance compute, GPU cluster architecture, and cloud-scale engineering. Someone capable of defining our technical vision, designing core systems from scratch, and leading engineering as we scale.
If your experience resembles strong technical architecture, cloud infra, AI systems, large-scale compute — this role is for you.
As the Tech Lead Architect, you will be responsible for the foundational architecture of our AI platform. You will work hands-on while also guiding long-term technical direction and building key systems that power our platform.
This is a founding-level role with extremely high ownership.
You will lead architecture for:
● LLM inference + serving stack
● Multi-GPU orchestration, scheduling, routing
● Distributed systems for large-scale model hosting
● High-throughput, low-latency developer APIs
● Observability, logging, monitoring, evals
● Cloud infra automation and cost-efficient scaling
Must-Have
● 7+ years experience in software engineering, infrastructure, or systems architecture
● Strong experience with:
Nice-to-Have
● Experience building or contributing to inference frameworks (vLLM, TensorRT-LLM, TGI)
● Deep understanding of LLM internals, KV cache, quantization, tensor parallelism
● Experience with data streaming, tracing, profiling, or log-based architectures
● Experience with ML training, fine-tuning pipelines, or HF ecosystem
● Startup/founding experience or appetite for zero-to-one environments
● Background in cloud cost optimization or infra financial modeling
● Architect an entire LLM serving platform that rivals Fireworks.ai throughput
● Build distributed multi-node inference with near-linear scaling
● Design a routing layer that chooses the best model based on latency/cost/accuracy
● Optimize GPU clusters for maximum tokens per dollar
● Create a unified logging + observability system for AI workloads
● Architect a fine-tuning and evals platform integrated with the serving layer
● Build a blueprint for expanding globally across multi-region data centers
● Build foundational systems for a new AI infra company
● Solve extremely hard technical problems with massive impact
● Work directly with founders who understand the tech deeply
● Define the engineering culture and architecture
● Fast execution environment with ownership over entire systems
● Competitive salary + founder-level equity
Please Include any examples of:
● distributed systems architecture
● LLM inference or GPU-related work
● large-scale backend or cloud systems design
● leadership roles or architecture documents you’ve written