We are looking for a skilled software engineer to deploy and manage machine learning models at scale. The ideal candidate will work on building and maintaining robust MLOps infrastructure using KServe and Kubernetes. You’ll be responsible for automating ML workflows and ensuring reliable model serving in production environments. This role involves close collaboration with data scientists and platform engineers to optimize performance and efficiency.
Responsibilities:
- Deploy, monitor, and manage ML models using KServe on Kubernetes.
- Build and maintain Kubernetes clusters (GKE) for scalable model serving.
- Develop, automate, and maintain ML workflows using Kubeflow Pipelines.
- Optimize performance and resource usage of deployed models.
- Collaborate with data scientists to productionize ML models.
- Set up monitoring and alerting for model health and latency.
- Ensure security, compliance, and reliability of ML infrastructure.
- Troubleshoot issues across the ML serving stack.
- Support continuous integration and deployment (CI/CD) for ML workflows.
- Document architecture, processes, and best practices.
Min Requirements:
- Hands-on experience with KServe, Kubernetes, and Docker.
- Strong proficiency in Java and Spring Boot.
- Experience with GCP (especially GKE) and Kubeflow Pipelines.
- Understanding of model performance management and real-time systems.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Nice to Have:
- GCP and/or Kubernetes certifications.
- Experience with ML observability or monitoring tools.
- Familiarity with deep learning models or recommendation systems.
- Exposure to data versioning and ML metadata tracking tools (e.g., MLflow, DVC).
- Experience optimizing inference performance for large-scale systems.
- Contributions to open-source MLOps tools or platforms.
What we offer:
- Opportunity to work on cutting-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, vision, dental, etc.
- Corporate social events
- Professional development opportunities
- Well-equipped office
NB:
Placement and Staffing Agencies need not apply. We do not work with C2C at this time.
At this moment, we are not able to process H1B transfers. Applicants with CPT and OPT visas are welcome to apply.
About Us:
Grid Dynamics (Nasdaq: GDYN) is a digital-native technology services provider that accelerates growth and bolsters competitive advantage for Fortune 1000 companies. Grid Dynamics provides digital transformation consulting and implementation services in omnichannel customer experience, big data analytics, search, artificial intelligence, cloud migration, and application modernization. Grid Dynamics achieves high speed-to-market, quality, and efficiency by using technology accelerators, an agile delivery culture, and its pool of global engineering talent. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the US, UK, Netherlands, Mexico, and Central and Eastern Europe.
To learn more about Grid Dynamics, please visit www.griddynamics.com. Follow us on Facebook, Twitter, and LinkedIn.