For companies

Hire developers
Hire designers
Hire marketers
Hire product managers
Hire project managers
Hire assistants
How Arc works
How much can you save?
Case studies
Pricing
Resources

For talent

Overview
Remote jobs
Remote companies
Resources
- Resume builder and guide
- Talent career blog

Arc

Senior Software Engineer (Generative AI Cloud Infrastructure)

Location

Remote restrictions apply

See all remote locations

Salary Estimate

N/A

Seniority

Senior

Tech stacks

Cloud

Software Development

+31

Permanent role

17 days ago

Apply now

[Open to candidates based in the UK / US and Western Europe]

Full-Time · Remote OR Hybrid · Founding Team Opportunity

About Us

We are building a Gen AI Acceleration Cloud an end-to-end platform for the full generative AI lifecycle. Our focus is to deliver blazing-fast LLM inference, scalable fine-tuning, and modern AI cloud infrastructure that GPUs, SmartNICs/DPUs, and ultra-fast networking fabrics.

Our platform powers mission-critical workloads with:

● On-demand & managed Kubernetes clusters

● Slurm-based training clusters

● High-performance inference services

● Distributed fine-tuning and eval pipelines

● Global data centers &heterogeneous GPU fleets

We are looking for a Senior Software Engineer to design, build, and scale the core systems behind our AI cloud.

What You’ll Work On

High-Performance AI Cloud Infrastructure

● Design and maintain fault-tolerant, high-availability backend services running across global data centers.

● Build operators and automation systems for:

○ GPU management

○ Infiniband partitioning

○ VM provisioning

○ High-throughput storage provisioning

LLM & GPU Virtualization Platform

● Build the IaaS software layer for new GPU clusters with thousands of next-gen accelerators (H100, GB200, GB300).

● Work on scalable GPU virtualization (PCIe passthrough, MIG, SR-IOV, VFIO). Massive-Scale Storage & Data Systems

● Contribute to a global multi-exabyte, high-performance object store optimized for pretraining datasets.

Build distributed data loaders, caching layers, metadata services, and throughput-optimized pipelines.

Observability, Reliability &Automation

● Develop advanced observability stacks (Prometheus, Grafana, OpenTelemetry).cDesign automated node lifecycle management for large-scale distributed training and inference.

● Build robust testing frameworks for resiliency, failover, and fault tolerance. Core Platform Engineering

● Contribute to the core internal + open-source platform components.

● Write tooling, SDKs, and documentation for developer-facing services.

● Research decentralized AI workloads and build reference architectures.

Requirements

Fundamentals

● 5+ years of production software engineering experience.

● Strong proficiency in one or more backend languages (Golang highly preferred; Rust/Python also valued).

● 5+ years building high-performance, well-tested, production-grade distributed services.

Cloud & Systems Experience

● Experience with distributed microservices across AWS/GCP/Azure.

● Deep understanding of systems fundamentals:

○ Concurrency

○ Memory management

○ High-performance I/O

○ Distributed consensus

○ Large-scale system design

Kubernetes / Infrastructure Expertise (Big Plus)

● Kubernetes internals: custom operators, CRDs, schedulers, or networking/storage plugins.

● Experience with Cluster API, KubeVirt, or similar orchestration tooling. Virtualization / Compute (Big Plus)

● Experience with hypervisors (QEMU/KVM, cloud-hypervisor).

● PCIe passthrough, SR-IOV, GPU virtualization, MIG, NVLink topologies.

● Experience with DPUs/SmartNICs.

Networking (Big Plus)

● Infiniband / RDMA

● VLAN/VXLAN/VPC

● OVS/OVN

● High-performance DC networking

High-Performance Compute (Plus)

● CUDA, NCCL, GPU drivers, parallel training stacks

● Experience with GPU scheduling, workloads, and distributed ML

Infrastructure Automation &Tooling (Expected)

● Terraform, Ansible, CI/CD

● GitHub Actions, ArgoCD

● Prometheus, Grafana, ELK, OpenTelemetry

Preferred Experience

● Built or operated IaaS/PaaS systems

● Experience with large-scale storage systems (Ceph, Lustre, or custom object stores)

● Knowledge of vLLM, TensorRT-LLM, TGI, or other LLM-serving frameworks

● Experience building infra for ML, training, inference, or fine-tuning

Responsibilities

● Perform architecture &research for distributed and decentralized AI workloads.

● Build and maintain foundational infrastructure powering training, inference, and fine-tuning.

● Contribute to core, open-source platform components.

● Own end-to-end services from design → implementation → operations.

● Create testing frameworks for robustness, failover, and performance.

● Collaborate across hardware, product, and ML teams to design next-gen infra.

Who You Are

● A deeply technical engineer who thrives in complex systems work.

● Strong communicator who writes clear design docs.

● Curious, low-ego, and great at collaborating with cross-functional teams.

● Motivated by building world-class AI infrastructure from the ground up.

● Thrives in zero-to-one, fast-moving startup environments.

Compensation

● Competitive salary

● Meaningful early equity

● Benefits

● Salary determined by experience and location.

About Arc

👥11-50

📍Taipei, Taiwan

🔗Website

Arc Service

How does Arc work?

Arc aggregates job openings at 13,000+ companies hiring remotely worldwide. Our algorithm recommends your best-fit jobs, saving you time, effort, and browser tabs. This is open for all developers to use for free.

About the Arc team

To succeed as a developer, I believe that the quality of your code and your personality are both important. This means not just meeting feature requirements, but also paying attention to the structure of the code. This kind of developer pays close attention to implementing best practices. Moreover, developers also need to give and receive objective feedback. This feedback isn't personal, but instead aimed at improving the team processes. Through trusting each other, we're able to grow as a team. My biggest area of growth as Front-end Lead at Arc has been in making technical decisions: identifying issues, discussing potential solutions, and analyzing the pros and cons of each option. Our team culture of deep discussion means we learn a lot from each other, and develop a thorough understanding of issues in both breadth and depth. The whole team is enabled to make both big and small decisions that impact different areas of the product. Because we use well-established processes to determine trade-offs, we’re able to make these decisions with confidence.

Oliver Lin, Front-End Lead

🇹🇼 Taiwan

View more members

Where they live?

11 | 🇹🇼 Taiwan

Company culture

Fast and focused

Future of Work and EdTech products move fast — and so do we. We like to experiment and test things, and we ship the MVP. We stay focused on building the stuff that matters.

Everyone is an owner

Bring your whole self to work, and take projects from your initial idea to launch. Need help from another team? Just ask.

Visit company profile

Unlock all Arc benefits!

Browse remote jobs in one place
Land interviews more quickly
Get hands-on recruiter support

PRODUCTS

Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS

About us Pricing Arc Careers - Hiring Now!Remote Junior Jobs Remote jobs Career Success Stories Talent Career Blog Arc Newsletter

JOBS BY EXPERTISE

Remote Front End Developer Jobs Remote Back End Developer Jobs Remote Full Stack Developer Jobs Remote Mobile Developer Jobs Remote Data Scientist Jobs Remote Game Developer Jobs Remote Data Engineer Jobs Remote Programming Jobs Remote Design Jobs Remote Marketing Jobs Remote Product Manager Jobs Remote Project Manager Jobs Remote Administrative Support Jobs

JOBS BY TECH STACKS

Remote AWS Developer Jobs Remote Java Developer Jobs Remote Javascript Developer Jobs Remote Python Developer Jobs Remote React Developer Jobs Remote Shopify Developer Jobs Remote SQL Developer Jobs Remote Unity Developer Jobs Remote Wordpress Developer Jobs Remote Web Development Jobs Remote Motion Graphic Jobs Remote SEO Jobs Remote AI Jobs

Cookie Policy Privacy Policy Terms of Service