About the Opportunity:
The organization is developing an AI Cloud platform, a unified multi-cloud service that connects local development with global-scale deployment. The Principal Software Engineer will define the technical vision and lead the design and implementation of distributed systems for the AI Cloud, partnering with technical leaders to architect scalable, reliable, and secure infrastructure for a global developer audience.
Responsibilities:
• Define and drive the long-term technical strategy for control and data plane services in the AI Cloud
• Architect highly available, multi-region systems across multiple cloud providers
• Design APIs and service abstractions that integrate local and enterprise cloud services
• Establish standards for reliability, scalability, and observability across the platform
• Lead cross-functional technical discussions and influence architectural decisions
• Design and implement distributed systems for orchestration, service discovery, and lifecycle management
• Build and operate control plane components for multi-tenant workloads and cloud networking
• Develop infrastructure for performance, intelligent scaling, and automated failover
• Ensure security, data integrity, and compliance across the infrastructure
• Partner with platform and product teams to deliver developer-friendly APIs and cloud experiences
• Align technical direction with business objectives for cloud growth and platform unification
• Evaluate and guide adoption of emerging technologies
• Drive initiatives to reduce latency, optimize cost, and improve performance
• Define metrics and SLAs for reliability and scalability
• Mentor senior, staff, and principal engineers
• Lead design reviews and guide production system decisions
• Drive operational excellence, ownership, and innovation
• Collaborate with engineering and product leadership on priorities and resource planning
Requirements:
• 10+ years of software engineering experience, including 3+ years in technical leadership roles (Staff or Principal level)
• Experience designing and building highly scalable distributed systems in production environments
• Deep understanding of cloud infrastructure (AWS, Azure, GCP, or OCI), including compute, networking, and storage
• Proficiency in Go, Rust, or Java
• Expertise in Kubernetes, microservices, and service mesh architectures
• Strong foundation in observability, CI/CD, and infrastructure-as-code (Terraform, Pulumi, or CloudFormation)
• Experience operating high-availability (99.99%+) production systems
• Exceptional communication skills and ability to influence across technical and business domains
• Experience designing multi-cloud or cross-cloud abstractions and orchestration layers (preferred)
• Knowledge of container lifecycle management, networking, and policy enforcement (preferred)
• Prior experience in developer infrastructure, PaaS, or hyperscale SaaS environments (preferred)
• Background contributing to open source or developer-focused platforms is a plus
Benefits & Perks:
• Flexible remote work
• Quarterly Whaleness Days
• Home office setup support
• 16 weeks paid parental leave
• Technology stipend ($100 net/month)
• PTO plan
• Quarterly company-wide hackathons
• Training stipend for conferences and courses
• Equity participation
• Swag
• Medical benefits, retirement, and holidays (vary by country)
Note:
“RemoteHunter is not the Employer of Record (EOR) for this role. Our purpose in this opportunity is to connect exceptional candidates with leading employers. We help job seekers worldwide discover roles that match their goals and guide them to complete their full application directly through the hiring company’s career page or ATS.”