Job Description
Profile Summary
- Data Engineer with 5 years of experience in Big Data technologies.
- Proven expertise in designing and developing scalable data pipelines using Apache Spark and PySpark.
- Proficient in Python programming.
- Hands-on experience with Databricks, MongoDB, Docker, and Kubernetes.
- Solid understanding of DevOps practices and tools.
Primary Skills
- Design, develop, and maintain scalable and reliable data pipelines.
- Translate business requirements and data models into efficient ETL code.
- Work independently on Spark-based data processing workflows.
- Skilled in writing complex business logic using PySpark and Spark SQL.
- Strong understanding of Spark clusters, distributed computing, and parallel processing.
- Capable of writing unit test cases to validate ETL logic.
- Experience in building reusable functions and applying modular programming approaches.
- Worked with AWS Athena for table creation, indexing, and writing complex SQL queries.
- Experience working with both relational databases (e.g., Oracle, SQL Server) and NoSQL databases (e.g., MongoDB) for data extraction and loading.
Secondary Skills
- Experience in creating AWS IAM roles and defining access policies.
- Familiar with Docker image creation and customization.
- Experience with Camunda for workflow orchestration and automation.
Additional Expectations
- Strong communication skills.
- Proficient in troubleshooting and problem-solving.
- Demonstrates a continuous learning mindset.
- Focused on automating manual tasks through coding wherever feasible.
Technical Skills
Mandatory Skills:
Python for Data, Apache Spark, Java, Python, Scala, Spark SQL, Databricks