Overview
We are looking for a highly skilled Data Engineer with strong experience building AI-ready data pipelines, designing scalable cloud architectures, and implementing CI/CD over data and infrastructure workflows. This role sits at the intersection of data engineering, machine learning operations (MLOps), and cloud infrastructure, and will play a key role in turning raw data into production-grade AI systems.
Responsibilities
Data & AI Pipelines
- Design, build, and maintain end-to-end data pipelines supporting analytics and AI/ML use cases
- Develop feature pipelines and data transformations for model training, inference, and monitoring
- Implement batch and streaming pipelines using modern data stacks
- Ensure data quality, lineage, versioning, and observability across pipelines
AI / MLOps Enablement
- Build and operationalize model training and inference pipelines
- Integrate data pipelines with ML frameworks and orchestration tools
- Support model versioning, experiment tracking, and reproducible training
- Collaborate with data scientists and ML engineers to productionize models
Cloud & Architecture
- Design scalable, fault-tolerant cloud architectures for data and AI workloads
- Work with cloud services (AWS, GCP, or Azure) for storage, compute, orchestration, and networking
- Optimize cost, performance, and reliability of data infrastructure
- Make architectural decisions around data lakes, warehouses, feature stores, and serving layers
Infrastructure as Code & CI/CD
- Implement Infrastructure as Code (IaC) using tools like Terraform, Pulumi, or CloudFormation
- Build CI/CD pipelines for data workflows, ML pipelines, and infrastructure changes
- Automate testing, validation, and deployment of data and AI systems
- Enforce best practices around security, secrets management, and access control
Collaboration & Ownership
- Work cross-functionally with product, ML, and engineering teams
- Own systems end-to-end, from design through production and monitoring
- Document architectures, pipelines, and operational processes
Required Qualifications
- 4+ years of experience in data engineering or platform engineering
- Strong proficiency in Python (and/or Scala) for data and pipeline development
- Experience building AI/ML pipelines (training, inference, or feature engineering)
- Solid experience with cloud platforms (AWS, GCP, or Azure)
- Hands-on experience with CI/CD pipelines and Infrastructure as Code
- Strong understanding of data modeling, distributed systems, and pipeline orchestration
Preferred / Nice to Have
- Experience with orchestration tools (Airflow, Dagster, Prefect, Argo)
- Experience with data warehouses and lakes (Snowflake, BigQuery, Redshift, Delta/Iceberg)
- Familiarity with MLOps tools (MLflow, SageMaker, Vertex AI, Kubeflow)
- Experience with streaming systems (Kafka, Pub/Sub, Kinesis)
- Background in building production AI systems, not just experimentation
What Success Looks Like
- Reliable, scalable data pipelines powering AI and analytics use cases
- Clean, automated deployments of data and ML infrastructure
- Reduced time from data ingestion to model production
- Clear, well-documented architectures that scale with the business