Senior SWE – GPU/Networking/AI Infra – Up to 250K Base - Remote
This position is open to candidates working remotely in the United States or Canada.
Our client is a cloud technology company driving the next generation of AI infrastructure. They empower organizations to build and scale AI and ML solutions without the need for large in-house teams or heavy upfront infrastructure costs. Their global team of engineers works at the forefront of GPU cloud computing, supporting businesses across industries to solve complex, real-world problems.
The company operates with a flat structure, minimal bureaucracy, and a strong focus on ownership, speed, and technical excellence. Engineers work closely with customers and internal teams to design scalable solutions and influence product direction, creating direct impact on how modern AI platforms are built and operated.
The Role
They are looking for someone to build the network automation and observability systems that power a global GPU fleet. This is a hands-on engineering role at the intersection of software and network infrastructure.
You will work with cutting-edge NVIDIA hardware that most engineers never get close to, and you'll be helping design systems that often get redesigned within weeks: because that's the pace. If you thrive in environments where speed, autonomy, and real engineering ownership matter, this role is for you.
Responsibilities
- Build and maintain the services and tools that keep their global network of thousands of GPU nodes running smoothly
- Build tooling that sits between the network core and the cloud platform running on top
- Create monitoring and alerting that gives the team clear visibility and helps resolve issues faster
- Make network changes less risky through solid review processes and safeguards
- Work closely with network engineers and SREs to turn day-to-day pain points into reliable internal tools
Tech & Skills Requirements
- 10+ years of professional software engineering experience, or equivalent practical background
- Proficiency in Go, or a genuine readiness to switch; Python is also welcome
- You don't need to be a network expert, but a genuine interest in infrastructure and networking is expected
- Strong communication skills and the ability to work autonomously in a fast-paced, high-trust environment
Bonus Points For
- Background in network engineering or SRE: someone who understands operational realities, not just code
- Experience at companies operating at hyperscale: Cloudflare, major cloud providers, or similar
- Familiarity with Prometheus-compatible monitoring stacks (e.g. VictoriaMetrics) or large-scale telemetry systems
- Exposure to Juniper or other vendor networking equipment
- Comfort debugging OSS projects and contributing fixes across languages
Interview Process
- Preliminary interview
- Technical coding interview
- Final technical deep dive
The Offer
- Base salary up to 250K USD plus bonus and RSUs
- Remote role within the US/Canada
- No take-home assignment throughout the process