• Collaborate with cross-functional teams (including Data Scientists, Analysts, Developers) to define data requirements, pipeline design, and solutions.
• Design, implement, and maintain scalable ETL pipelines using AWS Glue (Spark), Python, and PySpark.
• Manage complex data workflows and dependencies, including with AWS Lambda, using Airflow (or similar orchestration tools).
• Build and maintain cloud-native, scalable, and cost-effective data infrastructure on AWS, ensuring performance optimization.
• Integrate and optimize data flows across various AWS services like S3 (Glue Catalog, Athena), Aurora Postgres, Redshift, and Iceberg tables.
• Ensure data quality, governance, and compliance standards are met across all data processes.
• Take ownership of the end-to-end lifecycle of data pipelines, from design to deployment and monitoring.
• Collaborate closely with Data Science teams to operationalize models, leveraging AWS Sagemaker where applicable.
• Ensure strong documentation practices and follow best practices for scalability, maintainability, and cost management.

Qualifications

Master’s degree in Data
5 years of experience in a similar role

Mandatory Hard skills :
Python & Spark

Proficiency in Python (including Pandas) for data transformations using TDD approach.
Hands-on experience with Apache Spark, ideally via AWS Glue.

Cloud Services (AWS)

Experience with AWS services such as S3, Glue, Athena, Redshift, Aurora, Lambda, IAM, and EventBridge.
Comfortable with cloud-based architecture, serverless design, and deployment strategies.

Data Workflow Orchestration

Experience in building and managing DAGs in Airflow (or similar tools).
Familiarity with Lambda-based event-driven architectures.

Software Engineering Practices

Proficiency with Git (branching, pull requests, code reviews) and CI/CD pipelines.
Understanding of release management and automated testing in a multienvironment setup.

Big Data Ecosystems

Hands-on experience with distributed processing frameworks (like Spark) and large-scale data lake solutions, including Iceberg tables.
Familiarity with data lake architectures, partitioning strategies, and best practices.

Hard skills that would be a real plus :

Experience with dimensional modelling, partitioning strategies, or other best practices.
SQL knowledge for properly designing and maintaining data schemas for efficient querying and reporting
Infrastructure as a code: Familiarity with tools like Terraform or CloudFormation for automated AWS provisioning.
Event-Driven Architectures: Experience with event-driven architectures and working with AWS EventBridge / Lambda / SQS / SNS

Soft skills :

Effective communication skills for scrum team collaboration
Ownership & Accountability: Drive projects independently and take full responsibility for deliverables.
Effective Communication: Ability to explain technical details to both technical and non-technical stakeholders.
Strong Problem-Solving: Ability to dissect complex issues, propose scalable solutions, and optimize workflows.
Team Collaboration: Comfortable working with cross-functional teams (Data Scientists, Developers, Analysts, etc.).
Adaptability & Continuous Learning: Eagerness to explore emerging technologies and refine existing processe

null