Job Summary:
As a Data & ML Support Engineer, you will play a crucial role in ensuring the reliability, performance, and scalability of Data & machine learning operations within our organization.
You will work closely with data engineers, data scientists, MLOps engineers, and other technical teams to provide expert support and troubleshooting assistance for data and machine learning infrastructure, tools, and workflows. Your primary focus will be on diagnosing and resolving issues, optimizing processes, and implementing best practices to enhance the efficiency.
Responsibilities:
1. Technical Support:
• Serve as the first point of contact for technical issues related to pipeline executions, and deployments.
• Respond to support tickets, emails, and inquiries in a timely manner, and provide accurate and actionable solutions to users and stakeholders.
• Collaborate with cross-functional teams to address complex technical challenges and ensure timely resolution of issues.
2. Incident Management:
• Monitor system health and performance metrics to proactively identify and address potential issues before they impact operations.
• Investigate and troubleshoot incidents, including root cause analysis, impact assessment, and resolution implementation.
• Escalate critical issues to appropriate teams and stakeholders, and participate in incident response and post-incident reviews.
3. Documentation and Knowledge Sharing:
• Maintain comprehensive documentation of support procedures, troubleshooting guides, and best practices to facilitate knowledge sharing and training.
• Contribute to internal knowledge bases, wikis, and forums to ensure that support resources are up to date and accessible to all relevant parties.
4. Process Optimization:
• Identify opportunities to streamline and automate support processes, workflows, and tooling to improve efficiency and scalability.
• Work closely with Data/ML engineers and DevOps teams to implement automation scripts, monitoring solutions, and self-service tools for common support tasks.
5. Training and Education:
• Provide training and guidance to users, developers, and stakeholders on the best practices, tools, and workflows.
• Conduct workshops, webinars, and tutorials to help internal teams develop the skills and knowledge required to effectively utilize machine learning infrastructure and tools.
6. Continuous Improvement:
• Collect feedback from users and stakeholders to identify areas for improvement in machine learning systems, processes, and user experience.
• Propose and implement enhancements, updates, and optimizations to meet evolving business requirements and industry standards.
Qualifications:
• 5 to 8 years of relevant experience with Bachelor’s degree in computer science, Engineering, or a related field.
• Proven experience in a technical support or operations role, preferably in a datadriven and machine learning environment.
• Strong understanding of ETL, Machine learning concepts, algorithms, and workflows.
• Proficiency in scripting and automation using language such as Python
• Experience with containerization technologies (Docker, Kubernetes), SQL, Airflow, Azure Databricks, APIs, DataDog, Git, Azure cloud - Compute Services, Storage Services, Networking Services, Identity Access Management, Security & Compliance.
• Excellent analytical and problem-solving skills, with the ability to troubleshoot complex technical issues and perform root cause analysis.
• Strong communication and interpersonal skills, with the ability to collaborate effectively with diverse teams and stakeholders.
• Self-motivated and proactive attitude, with a commitment to continuous learning and professional development.
Languages
• Fluent English communication skills, both written and spoken.
Preferred:
• Azure Certifications: Azure AI Engineer Associate, Azure Data Scientist Associate, Azure AI Fundamentals
• Experience with MLOps tools and frameworks (e.g., MLflow, TensorFlow Extended).
• Experience with ML algorithms, Data Preparation, model monitoring (Data drift & Model performance) and retraining.
• Knowledge of data engineering concepts and tools for data preprocessing and feature engineering.
• Familiarity with DevOps practices and configuration management.
• Understanding of security best practices for machine learning systems and data privacy regulations.
Location: Remote
Model: Service Contract (PJ - Independent Contractor)