Experience: 3–4 years
Location: On-site — Gurgaon, India
Employment Type: Full-time
About tracebloc
tracebloc is a Berlin-based AI startup building tooling for data scientist, allowing them to evaluate and benchmark third-party AI models without the need to expose their data. We have recently received $2,5m in funding and are aiming to build the category leader in AI model discovery.
About the Role
We are seeking a highly capable and self-driven Junior Data Scientist with 3–4 years of experience to take full ownership of building a production-grade ML platform — not just individual models. This role is pipeline-centric, focusing on designing, implementing, and maintaining robust data science workflows that enable scalable, repeatable, and automated model training and deployment.
This is an on-site role based in Gurgaon where you will independently architect, build, deploy, and maintain the entire ML infrastructure.
Key Responsibilities
- Independently own and manage the full ML system lifecycle: from feature engineering and model development to production deployment and maintenance
- Design and build scalable, automated pipelines for data ingestion and processing, model training at scale, evaluation and monitoring, retraining workflows
- Build and maintain a platform-oriented architecture enabling multiple ML workflows with extensibility and reuse in mind.
- Deploy and operate ML workloads on AWS and/or Azure cloud infrastructure using containerized environments.
- Implement robust CI/CD pipelines for automated testing, deployment, and rollback of ML services.
- Write clean, production-grade, well-tested Python code supporting these workflows.
- Maintain operational excellence for deployed pipelines: monitor performance, diagnose issues, and drive continuous improvement.
Required Skills
- Strong hands-on experience in Python, with proficiency in ML libraries such as scikit-learn, pandas, NumPy, PyTorch, tensorflow.
- Experience in building end-to-end ML pipelines as engineered software systems (not notebooks or isolated scripts).
- Practical experience with ML workflow orchestration tools (e.g., Airflow, Kubeflow Pipelines, SageMaker Pipelines preferred but not mandatory).
- Deep understanding of pipeline design patterns and best practices for production environments.
- Experience in building ML Pipelines for Computer Vision and NLP tasks.
Good to have Skills
- Hands-on experience with AWS and/or Azure cloud services for data science workloads
- Understanding and experience with Kubernetes and Docker
- Experience setting up and maintaining CI/CD pipelines for ML deployments.
- Ability to write and maintain unit tests, integration tests, and validation tests for ML pipelines and APIs
- Prior work on platform architecture for multi-tenant ML workflows