Job Description
To develop, implement, and optimize complex Data Warehouse (DWH) and Data Lakehouse solutions using the Databricks platform (including Delta Lake, Unity Catalog, and Spark) to ensure a scalable, high-performance, and governed data foundation for analytics, reporting, and Machine Learning.
Responsibilities
A. Databricks Development and Architecture
- Advanced Design and Implementation: Design and implement robust, scalable, and high-performance ETL/ELT data pipelines using PySpark/Scala and Databricks SQL on the Databricks platform.
- Delta Lake: Expertise in implementing and optimizing the Medallion architecture (Bronze, Silver, Gold) using Delta Lake to ensure data quality, consistency, and historical tracking.
- Lakehouse Platform: Efficient implementation of the Lakehouse architecture on Databricks, combining best practices from DWH and Data Lake.
- Performance Optimization: Optimize Databricks clusters, Spark operations, and Delta tables (e.g., Z-ordering, Compaction, Tuning Queries) to reduce latency and computational costs.
- Streaming: Design and implement real-time/near-real-time data processing solutions using Spark Structured Streaming and Delta Live Tables (DLT).
B. Governance and Security
- Unity Catalog: Implement and manage Unity Catalog for centralized data governance, fine-grained security (row/column-level security), and data lineage.
- Data Quality: Define and implement data quality standards and rules (e.g., using DLT or Great Expectations) to maintain data integrity.
C. Operations and Collaboration
- Orchestration: Develop and manage complex workflows using Databricks Workflows (Jobs) or external tools (e.g., Azure Data Factory, Airflow) to automate pipelines.
- DevOps/CI/CD: Integrate Databricks pipelines into CI/CD processes using tools like Git, Databricks Repos, and Bundles.
- Collaboration: Work closely with Data Scientists, Analysts, and Architects to understand business requirements and deliver optimal technical solutions.
- Mentorship: Provide technical guidance and mentorship to junior developers and promote best practices.
Qualifications
A. Mandatory Knowledge (Expert Level)
- Databricks Platform: Proven, expert-level experience with the entire Databricks ecosystem (Workspace, Cluster Management, Notebooks, Databricks SQL).
- Apache Spark: In-depth knowledge of Spark architecture (RDD, DataFrames, Spark SQL) and advanced optimization techniques.
- Delta Lake: Expertise in implementing and managing Delta Lake (ACID properties, Time Travel, Merge, Optimize, Vacuum).
- Programming Languages: Advanced/expert-level proficiency in Python (with PySpark) and/or Scala (with Spark).
- SQL: Advanced/expert-level skills in SQL and Data Modeling (Dimensional, 3NF, Data Vault).
- Cloud: Solid experience with a major Cloud platform (AWS, Azure, or GCP), especially with storage services (S3, ADLS Gen2, GCS) and networking.
B. Additional Knowledge (Major Advantage)
- Unity Catalog: Hands-on experience with implementing and managing Unity Catalog.
- Lakeflow: Experience with Delta Live Tables (DLT) and Databricks Workflows.
- ML/AI Concepts: Understanding of basic MLOps concepts and experience with MLflow to facilitate integration with Data Science teams.
- DevOps: Experience with Terraform or equivalent tools for Infrastructure as Code (IaC).
- Certifications: Databricks certifications (e.g., Databricks Certified Data Engineer Professional) are a significant advantage.
C. Education and Experience
- Education: Bachelor’s degree in Computer Science, Engineering, Mathematics, or a relevant technical field.
- Professional Experience: Minimum of 5+ years of experience in Data Engineering, with at least 3+ years of experience working with Databricks and Spark at scale.
Additional Information
Benefits
- Full access to foreign language learning platform
- Personalized access to tech learning platforms
- Tailored workshops and trainings to sustain your growth
- Medical insurance
- Meal tickets
- Monthly budget to allocate on flexible benefit platform
- Access to 7 Card services
- Wellbeing activities and gatherings
Working model: hybrid - 2 days at the office