For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
Arc Exclusive
Arc Exclusive

Tech Lead - Software Engineer (AI Infrastructure & Model Serving) - Perm - US,EU,UK

Location

Remote restrictions apply
See all remote locations

Salary

US$120K - 180K

Min. experience

5+ years

Required skills

PythonGolangAWSSoftware architectureDockerKubernetes

Full-time role
Posted 16 hours ago
Apply now
Actively recruiting / 16 applicants

We’re here to help you

Sole is in direct contact with the company and can answer any questions you may have. Email

SoleSole, Recruiter

Tech Lead — Software Engineer (AI Infrastructure & Model Serving)
Full-Time · Remote or Hybrid

About Us

We are building a high-performance AI platform which is fast inference, scalable model serving, evals, routing, and developer-friendly APIs.

Our mission is to provide the fastest, most reliable, and most cost-efficient LLM infrastructure for developers and enterprises.

We are looking for a Tech Lead Software Engineer with deep engineering instincts who can architect, build, and scale our core AI infra from the ground up.

This role is ideal for someone who has experience in LLM inference, GPU systems, distributed compute, low-latency APIs, or high-scale backend engineering.

Role Overview

As our Tech Lead, you will own the architecture, implementation, and evolution of our core platform. You will lead engineering decisions, work closely with founders on product direction, and build a team around you as we scale.
You will work across:
● High-performance model inference
● GPU/accelerator orchestration
● Distributed serving systems
● API gateway + developer platform
● Evals, model routing, logging, observability
● Reliability, scaling, and infra automation

What You’ll Own

  1. Core Infrastructure Architecture
    ● Architect the LLM inference stack (load-balancing, batching, token streaming).
    ● Optimize GPU utilization (tensor parallelism, quantization, batching, KV cache).
    ● Design distributed systems for high throughput and low latency.
    ● Lead model hosting: LLMs, diffusion models, multimodal, embeddings.
  2. Backend Platform Engineering
    ● Build APIs and SDKs (Python/JS) for developers.
    ● Implement observability tools: token logs, latency traces, request analytics.
    ● Build model routing layers based on cost/latency/performance tradeoffs.
    ● Integrate evals (benchmarks, datasets, scoring) into platform surfaces.
  3. Infrastructure Scaling & Reliability
    ● Create highly available serving clusters with autoscaling.
    ● Implement CI/CD, container orchestration, and deployment tooling.
    ● Improve system performance, throughput, and cost efficiency.
  4. Technical Leadership
    ● Drive engineering best practices and code quality.
    ● Make key architectural decisions and own the tech roadmap.
    ● Mentor engineers and help build out the initial engineering team.
    ● Work cross-functionally with product/design to define user-facing features.

What We’re Looking For

Must-Have
● 5+ years in backend, infra, or systems engineering
● Strong experience with:

  • ○ Python, Go, or Rust
  • ○ Cloud infrastructure (AWS/GCP/Azure)
  • ○ Containers + orchestration (Docker, Kubernetes, Ray, or similar)
    ● Ability to design and implement low-latency, high-scale services
    ● Experience owning architecture from 0 → 1 or leading major systems
    ● Strong debugging skills with performance-oriented mindset

**Nice-to-Have **
● Experience with:

  • ○ LLM inference (vLLM, TensorRT-LLM, DeepSpeed, HuggingFace TGI)
  • ○ Model quantization / LoRA / speculative decoding / paged attention
  • ○ Distributed training or fine-tuning pipelines
  • ○ CUDA/PyTorch, model inference kernels, or GPU programming
  • ○ Distributed systems (microservices, RPC, autoscaling, scheduling)
  • ○ GPU cluster management (NVLink, MIG, scheduling, multi-node topology)
  • ○ Building developer tools or API-based platforms
    ● Startup or early-stage company experience
    ● Strong communication + leadership instincts

Example Problems You Might Work On

● Build a vLLM-like inference engine with custom optimizations
● Design a dynamic batching service for 100k+ token/sec throughput
● Build the routing layer that selects models based on latency/cost constraints
● Implement streaming WebSocket APIs for high-speed generation
● Optimize GPU clusters for maximum throughput per dollar
● Build tooling for evals, performance dashboards, and observability
● Architect multi-model hosting across heterogeneous GPU pools

Why Join Us

● Build the core technical engine of an AI infra company
● Massive ownership and autonomy
● Work directly with founders
● Fast iteration and real product impact
● Competitive salary + meaningful early equity
● Opportunity to build and lead an engineering team

How to Apply

Share your resume, GitHub, or examples of relevant work.
If you have experience with LLM inference, GPU optimization, distributed systems, or building developer-first APIs, we strongly encourage you to apply.

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2025 Arc
Cookie PolicyPrivacy PolicyTerms of Service