An innovative AI company, backed by leading investors, is seeking a Senior Software Engineer to help shape the foundation for decentralized AI development at scale. The company provides a cutting-edge platform that enables researchers and engineers to train state-of-the-art models collaboratively, combining distributed training infrastructure with an intuitive developer experience.

The Role

This hybrid role spans both developer platform and infrastructure layers, offering the opportunity to work on two key areas:

1. AI Workload Management Platform – Developing user-friendly tools for managing AI workloads.

2. Distributed Training Infrastructure – Building high-performance infrastructure to support large-scale model training.

Key Responsibilities

Platform Development

• Develop intuitive web interfaces for AI workload management and monitoring.

• Build REST APIs and backend services using Python.

• Implement real-time monitoring and debugging tools.

• Create user-facing features for resource allocation and job scheduling.

Infrastructure Development

• Design and implement distributed training infrastructure in Rust.

• Develop high-performance networking and coordination components.

• Automate infrastructure provisioning with tools like Ansible.

• Manage cloud resources and container orchestration (Kubernetes).

• Implement scheduling systems for heterogeneous hardware (CPU, GPU, TPU).

Technical Skills & Experience

Required Skills

Platform Development:

• Strong backend development experience in Python (FastAPI, async).

• Proficiency in modern frontend frameworks (TypeScript, React/Next.js, Tailwind).

• Experience building developer tools and dashboards.

• Strong understanding of RESTful API design.

Infrastructure Development:

• Systems programming expertise with Rust.

• Hands-on experience with infrastructure automation (Ansible, Terraform).

• Proficiency in container orchestration (Kubernetes).

• Familiarity with cloud platforms (GCP preferred).

• Experience with observability tools (Prometheus, Grafana).

Nice to Have:

• Experience with GPU computing and ML infrastructure.

• Understanding of AI/ML model training architectures.

• Background in high-performance networking.

• Contributions to open-source infrastructure projects.

• Experience with real-time systems (WebSockets, streaming).

About asobbi

🔗Website

Visit company profile

Unlock all Arc benefits!

Browse remote jobs in one place
Land interviews more quickly
Get hands-on recruiter support

PRODUCTS

Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS

About us Pricing Arc Careers - Hiring Now!Remote Junior Jobs Remote jobs Career Success Stories Talent Career Blog Arc Newsletter

JOBS BY EXPERTISE

Remote Front End Developer Jobs Remote Back End Developer Jobs Remote Full Stack Developer Jobs Remote Mobile Developer Jobs Remote Data Scientist Jobs Remote Game Developer Jobs Remote Data Engineer Jobs Remote Programming Jobs Remote Design Jobs Remote Marketing Jobs Remote Product Manager Jobs Remote Project Manager Jobs Remote Administrative Support Jobs

JOBS BY TECH STACKS

Remote AWS Developer Jobs Remote Java Developer Jobs Remote Javascript Developer Jobs Remote Python Developer Jobs Remote React Developer Jobs Remote Shopify Developer Jobs Remote SQL Developer Jobs Remote Unity Developer Jobs Remote Wordpress Developer Jobs Remote Web Development Jobs Remote Motion Graphic Jobs Remote SEO Jobs Remote AI Jobs

Cookie Policy Privacy Policy Terms of Service