Description du poste
The consultant will be responsible for the following tasks :
• Collaborate with cross-functional teams (including Data Scientists, Analysts, Developers) to define data requirements, pipeline design, and solutions.
• Design, implement, and maintain scalable ETL pipelines using AWS Glue (Spark), Python, and PySpark.
• Manage complex data workflows and dependencies, including with AWS Lambda, using Airflow (or similar orchestration tools).
• Build and maintain cloud-native, scalable, and cost-effective data infrastructure on AWS, ensuring performance optimization.
• Integrate and optimize data flows across various AWS services like S3 (Glue Catalog, Athena), Aurora Postgres, Redshift, and Iceberg tables.
• Ensure data quality, governance, and compliance standards are met across all data processes.
• Take ownership of the end-to-end lifecycle of data pipelines, from design to deployment and monitoring.
• Collaborate closely with Data Science teams to operationalize models, leveraging AWS Sagemaker where applicable.
• Ensure strong documentation practices and follow best practices for scalability, maintainability, and cost management.
Qualifications
- Master’s degree in Data
- 5 years of experience in a similar role
Mandatory Hard skills :
Python & Spark
- Proficiency in Python (including Pandas) for data transformations using TDD approach.
- Hands-on experience with Apache Spark, ideally via AWS Glue.
Cloud Services (AWS)
- Experience with AWS services such as S3, Glue, Athena, Redshift, Aurora, Lambda, IAM, and EventBridge.
- Comfortable with cloud-based architecture, serverless design, and deployment strategies.
Data Workflow Orchestration
- Experience in building and managing DAGs in Airflow (or similar tools).
- Familiarity with Lambda-based event-driven architectures.
Software Engineering Practices
- Proficiency with Git (branching, pull requests, code reviews) and CI/CD pipelines.
- Understanding of release management and automated testing in a multienvironment setup.
Big Data Ecosystems
- Hands-on experience with distributed processing frameworks (like Spark) and large-scale data lake solutions, including Iceberg tables.
- Familiarity with data lake architectures, partitioning strategies, and best practices.
Hard skills that would be a real plus :
- Experience with dimensional modelling, partitioning strategies, or other best practices.
- SQL knowledge for properly designing and maintaining data schemas for efficient querying and reporting
- Infrastructure as a code: Familiarity with tools like Terraform or CloudFormation for automated AWS provisioning.
- Event-Driven Architectures: Experience with event-driven architectures and working with AWS EventBridge / Lambda / SQS / SNS
Soft skills :
- Effective communication skills for scrum team collaboration
- Ownership & Accountability: Drive projects independently and take full responsibility for deliverables.
- Effective Communication: Ability to explain technical details to both technical and non-technical stakeholders.
- Strong Problem-Solving: Ability to dissect complex issues, propose scalable solutions, and optimize workflows.
- Team Collaboration: Comfortable working with cross-functional teams (Data Scientists, Developers, Analysts, etc.).
- Adaptability & Continuous Learning: Eagerness to explore emerging technologies and refine existing processe
null