Job Title: Software Engineer III (Python/ML Systems Engineer)
Location: Remote (ideally EST; anywhere in NORAM acceptable)
Pay Rate: $100/hr on W2
Duration: 1 year, with possibility for extension
About the Role:
- We are seeking a strong Python/ML Systems Engineer to join our Fundamental AI Research team, focused on making research breakthroughs in AI. The role involves developing and maintaining deep learning libraries that support large-scale distributed training, open sourcing high-quality code, creating documentation, supporting users, and bringing the latest research to production-scale AI systems.
- The chosen candidate will work with a diverse, interdisciplinary team of scientists, engineers, and cross-functional partners, with access to cutting-edge technology, resources, and research facilities.
Key Responsibilities:
- Engineer, design, implement, and improve machine learning systems and tools to support AI research.
- Develop and maintain deep learning libraries that support large-scale distributed training.
- Apply knowledge of relevant research domains and expert coding skills to platform and framework development.
- Write clean, robust machine learning code.
- Create documentation and assist users in onboarding and leveraging tools effectively.
Minimum Qualifications:
- Degree in Computer Science, Computer Engineering, or a relevant technical field.
- 2+ years of experience developing machine learning systems in Python or C/C++.
- Experience with machine learning frameworks such as PyTorch or TensorFlow.
- Experience working with large datasets and data pipelines.
- Solid understanding of algorithms, data structures, and software engineering best practices.
- Ability to work collaboratively in a fast-paced, team-oriented environment.
- Excellent problem-solving and communication skills.
Preferred Qualifications:
- Demonstrated software engineering experience via work experience or widely used contributions in open-source repositories (e.g., GitHub).
- Prior contributions to open-source AI/ML projects.
Must-Have Skills:
- 5+ years of Python experience.
- 2+ years of PyTorch experience.
- 0–2 years of Distributed ML Training experience (FSDP/DDP).
- 5+ years of experience with datasets and PyTorch DataLoader.
- 3+ years of open-source software contributions.
Nice-to-Have Skills:
- Experience contributing to widely used open-source AI/ML projects.
Disqualifiers:
- Candidates with only general software engineering experience without large-scale model training in PyTorch are not suitable.
Interview Process:
- 1–2 rounds of mostly technical interviews.
- Focus areas: distributed training, DDP/FSDP, parallelism techniques, memory/throughput calculations, PyTorch APIs, and C++ programming.
In accordance with the California Fair Chance Act, Los Angeles County and San Francisco Fair Chance Ordinances, qualified applicants with arrest or conviction records will be considered. Certain criminal histories may impact the ability to perform key job duties, including adhering to policies, exercising judgment, managing stress, working safely, maintaining trustworthiness, meeting client standards, and protecting company operations and reputation.