We are looking for a Software Engineer III with strong Data Engineering and MLOps experience.
About the Role
This is not a Data Scientist role this position focuses on building production-grade data and ML pipelines and automating infrastructure on Azure Databricks. You will work at the intersection of data engineering, infrastructure automation, and machine learning, transforming prototype workflows into robust, observable, and cost-efficient production pipelines.
Responsibilities
- Design and maintain end-to-end data and ML pipelines using Databricks Workflows, Delta Lake, and Unity Catalog (bronze–silver–gold layers, schema evolution, access policies).
- Build reproducible training and deployment workflows integrated with experiment tracking, model registry, and artifact management tools.
- Implement data quality frameworks and observability metrics aligned with industry best practices.
- Build and monitor dashboards (Lakeview, Grafana, or similar) for data quality, model performance, and operational metrics.
- Automate data ingestion and feature generation jobs using PySpark, SQL Warehouses, and Databricks Asset Bundles under CI/CD (GitHub Actions or Azure DevOps).
- Manage access, security, and governance to ensure compliance and reliability.
- Optimize compute performance and cost (autoscaling, spot instances, cluster tuning, caching, partitioning).
- Develop automated pipelines for evaluation and data validation triggered by new telemetry data, ensuring reproducibility and traceability.
- Implement continuous monitoring for drift detection, feature stability, and prediction quality using Databricks observability tools or external frameworks.
- Ensure environment consistency and dependency management using Infrastructure as Code and containerization (Terraform or similar).
Qualifications
- Strong experience with Azure Databricks and PySpark
- Experience with MLOps / ML pipeline automation
- Experience with CI/CD and Infrastructure as Code
- Experience with data pipeline architecture in production environments
- Experience with monitoring, observability, and performance tuning
Required Skills
- Strong experience with Azure Databricks and PySpark
- Experience with MLOps / ML pipeline automation
- Experience with CI/CD and Infrastructure as Code
- Experience with data pipeline architecture in production environments
- Experience with monitoring, observability, and performance tuning
Pay range and compensation package
Location: Poland (remote, occasional office visits possible)
Contract: B2B / Contract
Rate: Hourly rate negotiable
Start date: Immediate or within two weeks preferred