Actively recruiting / 32 applicants
We’re here to help you
Wilson Bittencourt is in direct contact with the company and can answer any questions you may have. Email
Wilson Bittencourt, RecruiterCedana brings live migration and GPU virtualization to hyperscalars, frontier labs and HPC centers.
What you will do
- Design and build highly efficient and resilient systems that scale, ranging from HPC style system orchestration to multi-cloud instance management.
- You’ll span the stack from system-level to distributed programming , to exploit our unique insights in compute. You’ll build high performance, availability and reliability infrastructure that powers critical use cases in AI, High Performance Computing, scientific research and developer infrastructure.
- You'll gain a deep understanding of the physics of modern compute architecture: profiling the bottlenecks in network, memory, storage, interconnects, CPU, GPU to understand hierarchy of compute architecture. You'll use real world compute jobs to measure and track performance. This will help you identify and solve failure, scaling, performance and other issues.
- You’ll gain deep insights into compute systems that power the workloads themselves, and will educate our team on your learnings with thoughtful written communications that will be a foundation for us.
What we are looking for
- Obsessed with understanding how every layer of compute works.
- Ability and experience, or strong interest in learning the compute stack from hardware, device drivers, OS kernel and system, k8s, distributed systems. You don’t need to know all of these coming in but are curious and have the intellectual bandwidth to quickly learn them.
- You love to write your insights down and educate the team with humility. You should expect the same from us.
- Creative problem solving, multidisciplinary experience.
- Demonstrated ability to collaborate with others.
Preferred Qualifications
- Strong understanding of Linux and UNIX fundamentals (standard libraries, services, networking, kernel/user-space interaction).
- Strong understanding and experience with Kubernetes or container orchestration (contributors or familiarity with open source contributions a plus but not required)
- Strong system level programming experience (i.e. C/Rust/Go).
- Experience building resilient and real-time systems. This can come from a range of backgrounds including but not limited to Distributed Systems, Robotics, HPC, Aerospace, Medical Devices.
- Familiarity with CPU/GPU optimization techniques.
Nice to Have
- Have helped build a service on a large cloud provider (AWS/GCP/Azure/etc).
- Have worked on a high-availability system.
- Familiar with being oncall (our founders have experience being oncall, and know how rough it is!).
- Familiarity with problems associated with deploying large scale ML models or batch/scientific compute.
Even if you don’t meet all the above qualifications, we strongly encourage you to apply! We’re building some unique systems, and strongly believe in a multidisciplinary approach to engineering. Some of the most interesting challenges are solved by some of the most interesting people. We value people who are passionate, willing to learn and are high velocity over any single qualification.