Replicate makes it easy for software engineers to run and customize machine learning models in the cloud. With a library of thousands of open-source models, you can get started with one line of code—or fine-tune and deploy your own models when you need something custom. We handle the infrastructure, so you can focus on building. Our team comes from places like Docker, GitHub, and NVIDIA, and we’re obsessed with making AI as intuitive as deploying a web app. We build in public, ship fast, and care about getting the details right.
The Platform team at Replicate oversees the entire lifecycle of models, from packaging and deployment to serving, scaling, and monitoring. You’ll be developing the infrastructure that supports thousands of models and powers millions of predictions daily. This is a chance to build something truly innovative, where each decision you make has a tangible impact and allows your creativity to shine.
What You’ll Be Doing
Designing and building our deployment and model-serving platform.
Building technology to operate the latest advancements in the ML and AI space.
Designing systems to maximize the utilization and reliability of our Kubernetes clusters and GPUs, including multi-regional traffic shifting and failover capabilities.
Owning and optimizing fair and reliable task allocation and queuing across a diverse set of customers with heterogeneous workloads.
Working with our Models team to speed up model inference through techniques like caching, weights management, machine configurations, and runtime optimizations in Python and PyTorch.
Working with technologies such as
Python, Go, and Node.js
Kubernetes and Terraform
Redis, Google BigQuery, and PostgreSQL
We're looking for the right person, not just someone who checks boxes, but it’s likely you have…
These aren’t hard requirements, but we definitely want to talk with you if…
You'll be working from our beautiful office in the Mission, San Francisco, at least 3 days a week.
Compensation Range: $230K - $280K