For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
NVIDIA
NVIDIA

Senior Software Engineer, AI Systems - Nemo RL

Location

Remote restrictions apply
See all remote locations

Salary Estimate

N/AIconOpenNewWindows

Seniority

Senior

Tech stacks

Software Development
AI
Deep Learning
+27

Permanent role
3 days ago
Apply now

We are seeking highly skilled and motivated software engineers to join our Nemo RL team. You will empower AI practitioners to develop and deploy large language models (LLMs) using reinforcement learning (RL) techniques on the Nemo RL framework, productively and affordably. If you have experience with multi-node distributed training jobs and are passionate about solving the challenges associated with high-performance RL systems for LLMs, we invite you to join our team!

What You’ll Be Doing

  • Design and implement highly efficient distributed training systems for large-scale RL models.
  • Optimize parallelism strategies to improve performance and scalability across hundreds or thousands of GPUs.
  • Develop low-level systems components and algorithms to maximize throughput and minimize memory and compute bottlenecks.
  • Productionize the training systems with fault tolerance capabilities and an uncompromised software quality.
  • Collaborate with researchers and engineers to productionize cutting-edge model architectures and training techniques.
  • Contribute to the design of APIs, abstractions, and UX that make it easier to scale models while maintaining usability and flexibility.
  • Profile, debug, and tune performance at the model, system, and hardware levels.
  • Participate in design discussions, code reviews, and technical planning to ensure the product aligns with the business goals.
  • Stay up to date with the latest advancements in large-scale model training and help translate research into practical, robust systems.

What We Need To See

  • Bachelor’s, Master’s, or PhD degree in Computer Science/Engineering, Software Engineering, a related field, or equivalent experience.
  • 3+ years of experience in software development, preferably with Python and C++.
  • Deep understanding of machine learning pipelines and workflows, distributed systems, parallel computing, and high-performance computing principles.
  • Hands-on experience with large-scale training of deep learning models using frameworks like PyTorch, Megatron Core, or DeepSpeed.
  • Experience optimizing compute, memory, and communication performance in large model training workflows.
  • Familiarity with GPU programming, CUDA, NCCL, and performance profiling tools.
  • Solid grasp of deep learning fundamentals, especially as they relate to RL and training dynamics.
  • Ability to work closely with both research and engineering teams, translating evolving needs into technical requirements and robust code.
  • Excellent problem-solving skills, with the ability to debug complex systems.
  • A passion for building high-impact tools that push the boundaries of what’s possible with large-scale AI.

Ways To Stand Out From The Crowd

  • Background with building and optimizing LLM pre-training or post-training frameworks such as DeepSpeed, torchtitan, Nanotron, verl.
  • Experience building and optimizing LLM inference engines such as vLLM, SGLang.
  • Experience building ML compilers such as Triton, Torch Dynamo/Inductor.
  • Background with working with cloud platforms (e.g., AWS, GCP, or Azure), containerization tools (e.g., Docker), and orchestration infrastructures (e.g., Kubernetes, Slurm).
  • Exposure to DevOps practices, CI/CD pipelines, and infrastructure as code.

At NVIDIA, we believe artificial intelligence (AI) will fundamentally transform how people live and work. Our mission is to advance AI research and development to create groundbreaking technologies that enable anyone to harness the power of AI and benefit from its potential. Our team consists of experts in AI, systems and performance optimization. Our leadership includes world-renowned experts in AI systems who have received multiple academic and industry research awards. If you've hacked the inner workings of PyTorch, or if you've written many CUDA/HIP kernels, or if you've developed and optimized inference services or training workloads, or if you've built and maintained large-scale Kubernetes clusters, or if you simply just enjoy solving hard problems, feel free to drop an application!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 116,250 CAD - 201,500 CAD for Level 3, and 142,500 CAD - 247,000 CAD for Level 4.

You will also be eligible for equity and benefits .

Applications for this job will be accepted at least until August 22, 2025.

JR2002280

About NVIDIA

👥10000-
📍Santa Clara, CA
🔗Website

NVIDIA benefits and support

🏥Health insurance
🌴Retirement pension
🌞Healthy living stipend
📕Learning stipend
🍼Maternity/paternity leave
⌚️Flexible working hours
📊Stock options
🗺Company retreat
See more
Visit company profileIconOpenNewWindows

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2025 Arc
Cookie PolicyPrivacy PolicyTerms of Service