This position is part of a project for a world-renowned high-tech company, focused on building and improving the infrastructure that enables large scale machine learning frameworks used across the industry. Engineers on this project ensure that complex C++ and Python codebases, including core ML frameworks and supporting libraries, remain stable, efficient, and well integrated as they evolve. The role combines software development with build engineering and CI/CD automation, focusing on continuous enhancement of build and integration workflows and contributing to the ongoing evolution of these frameworks and their ecosystem.
Responsibilities:
- Develop, refine, and enhance large scale build configurations for C++ and Python projects
- Build and operate CI/CD pipelines to automate validation, testing, releases, and rollout of updates
- Investigate and resolve complex build, dependency, and integration issues across multiple repositories
- Implement code changes, fixes, and small features in ML frameworks and related libraries
- Ensure reproducible and hermetic builds using modern toolchains, caching, and distributed testing
- Manage and optimize containerized build and test environments (Docker)
- Collaborate with infrastructure, release, and ML engineering teams to ensure consistent integration and delivery
Minimum requirements:
- Strong proficiency in C++ and Python
- Experience with modern build systems at scale such as Bazel (preferred). Other experience with large build systems (Buck, Pants, CMake, or similar) is also valuable
- Hands-on experience with CI/CD automation (GitHub Actions preferred, Jenkins, Buildkite, or similar)
- Proficiency with Git or Mercurial including complex rebases, cherry-picks, and patch workflows
- Strong Bash or shell scripting for automation and environment setup
- Familiarity with Docker or similar container technologies for build and test automation
- Detail-oriented, systematic approach to problem solving with focus on reliability and scalability
- Bachelor’s/Master’s degree in Computer Science/ Engineering or a related fields
Would be a plus:
- Experience working with large open source ML frameworks such as TensorFlow, PyTorch, or JAX
- Familiarity with GPU build and testing workflows or multi-architecture builds
- Exposure to distributed or hermetic build environments and remote execution
- Understanding of dependency graph analysis and build tooling such as Bazel query or cquery
We offer
- Opportunity to work on cutting-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, vision, dental, etc.
- Corporate social events
- Professional development opportunities
- Well-equipped office
About us:
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.