Job Description: Senior MLOps Engineer
Location: Remote from Spain (Spanish employment contract)
About the Role
We are looking for an engineer to build and maintain ML/AI infrastructure on Google Cloud (GCP).
The main focus: make sure models (including LLMs) run reliably, securely, and in production — not just in researchers’ notebooks.
This is not a Data Scientist and not a pure DevOps. It’s a role at the intersection of Cloud + Data + ML.
Key Responsibilities
- Build and manage ML pipelines from training to production deployment (Vertex AI, Dataproc, Airflow/Cloud Composer).
- Set up CI/CD for ML: model deployment, testing, and updates automation.
- Manage infrastructure as code (Terraform, GitLab CI/CD).
- Containerization & orchestration (Docker, Kubernetes/GKE).
- Monitor models in production (Prometheus, Grafana, MLflow).
- Integrate LLMs (HuggingFace, OpenAI API, Gemini, LLaMA).
- Handle data workflows: BigQuery, Spark/Databricks, ETL pipelines.
- Ensure security & compliance (IAM, access control, audits).
Tech Stack
- Cloud: Google Cloud (Vertex AI, BigQuery, Dataproc, Composer, GKE).
- ML: TensorFlow, PyTorch, HuggingFace, MLflow.
- Data: Spark, Databricks, Airflow.
- Infra: Terraform, Docker, Kubernetes.
- CI/CD: GitLab, Jenkins (or similar).
- Monitoring: Prometheus, Grafana.
- Security: IAM, cloud/ML best practices.
What You Should Have
- Hands-on production experience with GCP (not just courses or theory).
- Solid understanding of how ML works in production: logging, monitoring, model versioning & rollback.
- Strong experience with Kubernetes & Terraform (must-have).
- Solid Python skills (ideally with ML libraries).
- Experience with LLMs is a strong plus.
Why This Role Is Challenging
- Many companies stop at PoC — here you’ll need to build real production-grade pipelines.
- LLM integration is still raw — you need a cool head to deploy & monitor them properly.
- GCP is less common than AWS/Azure — experience with Google Cloud is highly valued.
In one sentence:
This role is for an engineer who can turn researchers’ ML code into stable production services on GCP.