Actively recruiting / 10 applicants
We’re here to help you
Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email
Juliana Torrisi, RecruiterRole Overview
We are building the next generation of AI compute. Our technology breaks the "Memory Wall" by fusing software-level semantic compression with hardware-level memory tiering. We are looking for a systems-level visionary to help us turn theoretical silicon exploits into enterprise-ready production code on NVIDIA and AMD hardware.
Responsibilities
- Develop and optimize high-performance CUDA kernels and C++ modules that manage massive-scale memory architectures (HBM, DDR5).
- Architect and implement zero-copy memory-tiering solutions that allow consumer and enterprise GPUs to process multi-million token context windows.
- Collaborate with AI agents to rapidly research, prototype, and refine hardware-level optimizations for NVIDIA and AMD silicon.
- Lead the transition of academic-level silicon hacks into stable, fault-tolerant, and production-ready enterprise software.
Required Skills
- Mastery of Modern C++ (20/23) and CUDA.
- Deep understanding of Hardware Memory Physics (HBM3, GDDR7, PCIe Gen 5, DMA).
- Experience in compiler design or high-performance computing (HPC) for AI/LLM workloads.
- A "Hardware-First" mindset—you understand that software is limited by the laws of silicon and electricity.
- Expertise in using AI-agentic tools to accelerate complex systems-engineering workflows.
Nice to Have
- Familiarity with advanced memory management techniques.
- Experience in collaborating with hardware design teams.