Principal Systems Engineer (C++/CUDA) - PT Freelance - Americas/EMEA

Location

Remote restrictions apply

See all remote locations

Hourly rate

Min. experience

5+ years

Hours per week

15 hours

Duration

12 weeks

Required skills

CUDA C++GPUAmd products

Freelance job

Posted a day ago

Apply now

Actively recruiting / 10 applicants

We’re here to help you

Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email

Juliana Torrisi, Recruiter

Role Overview

We are building the next generation of AI compute. Our technology breaks the "Memory Wall" by fusing software-level semantic compression with hardware-level memory tiering. We are looking for a systems-level visionary to help us turn theoretical silicon exploits into enterprise-ready production code on NVIDIA and AMD hardware.

Responsibilities

Develop and optimize high-performance CUDA kernels and C++ modules that manage massive-scale memory architectures (HBM, DDR5).
Architect and implement zero-copy memory-tiering solutions that allow consumer and enterprise GPUs to process multi-million token context windows.
Collaborate with AI agents to rapidly research, prototype, and refine hardware-level optimizations for NVIDIA and AMD silicon.
Lead the transition of academic-level silicon hacks into stable, fault-tolerant, and production-ready enterprise software.

Required Skills

Mastery of Modern C++ (20/23) and CUDA.
Deep understanding of Hardware Memory Physics (HBM3, GDDR7, PCIe Gen 5, DMA).
Experience in compiler design or high-performance computing (HPC) for AI/LLM workloads.
A "Hardware-First" mindset—you understand that software is limited by the laws of silicon and electricity.
Expertise in using AI-agentic tools to accelerate complex systems-engineering workflows.