Actively recruiting / 10 applicants
We’re here to help you
Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email
Juliana Torrisi, RecruiterRole Overview
We are seeking an experienced DevOps / MLOps Engineer to become a core member of our technical team. This role focuses on owning the infrastructure and deployment processes for our production machine learning systems. You will ensure our prediction pipelines run reliably, deploy smoothly, and scale effectively. We are looking for someone who is passionate about automation, system reliability, and continuous improvement.
Responsibilities
- Maintain and improve deployment scripts and CI/CD workflows to ensure smooth and reliable code deployment from commit to production.
- Manage deployments for model fitting, data cleansing, and signal discovery services on Cloud Run, optimizing for performance and scalability.
- Oversee PostgreSQL and Streamlit instances on GCP VMs, ensuring instance management, updates, backups, and security are handled efficiently.
- Manage container images in GitHub Container Registry and Google Artifact Registry, including implementing cleanup policies and access control.
- Develop robust monitoring and alerting systems with comprehensive logging, metrics, and health checks to preemptively identify and resolve issues.
- Ensure configuration management across environments, maintaining parity where necessary and managing secrets securely.
Required Skills
- Docker: Extensive experience with containerization, including efficient Dockerfile creation, layer caching, multi-stage builds, and container issue debugging.
- Google Cloud Platform: Proficient in using Cloud Run, Compute Engine, Artifact Registry, Cloud Storage, and IAM, with a preference for scripting over manual configuration.
- CI/CD: Proficient in GitHub Actions, building, and maintaining deployment pipelines with a strong understanding of continuous integration and deployment.
- Linux Administration: Skilled in command-line operations, service management, and Bash scripting.
- PostgreSQL: Basic database administration skills, including backups, monitoring, and performance tuning.
Nice to Have
- Infrastructure as Code: Experience with tools like Terraform or Pulumi to ensure infrastructure is versioned, reviewed, and reproducible.
Clarify Ambiguities
- Current Environment: Our setup includes GCP services like Cloud Run, Compute Engine, and Artifact Registry, alongside Docker, Docker Compose, GitHub Actions, PostgreSQL 16, and Bash deployment scripts with Python wrappers.
Sensitivity
- Engineering Standards: Emphasize automation, thorough documentation, a security-first mindset, and a focus on reliability. Expect system changes to be versioned and reproducible, with clear recovery procedures in place.
Education
- A university degree in Computer Science, Engineering, or a related field is preferred, though equivalent demonstrated expertise will also be considered.
What Success Looks Like
- Deployments occur without drama or surprises, systems recover automatically from failures, and engineers deploy confidently. Infrastructure changes are documented, versioned, and reproducible, with reasonable costs and appropriately scaled resources.