The Senior Software Engineer, Full Stack - Data Engineering Focus, will be a key contributor to our data platform. This role involves designing, developing, and maintaining scalable and robust data pipelines, data lakes, and lakehouse architectures. The ideal candidate will possess strong expertise in Python, experience with various data warehousing solutions and big data processing frameworks (such as Snowflake, Databricks, or similar technologies), and a solid understanding of modern software development practices. Additionally, this role requires basic knowledge of API development, front-end technologies like React for creating analytical dashboards, and a passion for transforming raw data into actionable insights.
Key Responsibilities:
Data Pipeline Development & Optimization: Design, build, and maintain efficient, scalable, and reliable ETL/ELT data pipelines to ingest, process, and transform large volumes of structured and unstructured data from various sources.
Data Modeling & Architecture: Design and implement data models for optimal storage and retrieval. Play a key role in the architecture, design, and implementation of our data lake and lakehouse solutions, ensuring scalability, performance, and security.
Data Warehousing & Big Data Technologies: Develop and manage solutions utilizing modern data warehousing platforms and big data technologies (e.g., Apache Spark, cloud-based data warehouses, distributed processing systems). Experience with technologies like Snowflake or Databricks is highly valuable.
Full-Stack Development for Data Applications: Develop and maintain APIs for data access and integration with other applications and services.
Analytical Dashboard & Visualization Development: Collaborate with data analysts and business stakeholders to understand requirements and create insightful and interactive analytical dashboards using front-end technologies (e.g., React) and BI tools.
Data Quality & Governance: Implement data quality checks, validation processes, and monitoring to ensure data accuracy, consistency, and reliability. Adhere to data governance best practices.
Performance Tuning & Optimization: Monitor and optimize the performance of data pipelines, queries, and data storage systems.
Collaboration & Mentorship: Work closely with cross-functional teams including data scientists, analysts, product managers, and other engineers. Mentor junior team members and promote best practices in data engineering and software development.
Innovation & Emerging Technologies: Stay current with the latest trends and technologies in data engineering, big data, cloud computing, and analytics, and proactively identify opportunities for innovation and improvement.
Required Qualifications:
Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field.
5+ years of professional experience in software engineering, with a significant focus on data engineering.
Strong proficiency in Python for data processing, scripting, and application development.
Hands-on experience with one or more leading big data processing frameworks (e.g., Apache Spark, Apache Flink) and cloud-based data warehousing solutions (e.g., Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics). Demonstrable experience with platforms like Databricks or similar is a strong asset.
In-depth understanding and practical experience with data lake and lakehouse architectural patterns.
Experience with various data storage solutions (e.g., relational databases, NoSQL databases, object storage like AWS S3, Azure Blob Storage).
Solid understanding of ETL/ELT processes, data modeling principles, and data pipeline orchestration tools (e.g., Airflow).
Basic knowledge of API development (RESTful APIs) and experience integrating data services.
Familiarity with front-end technologies, specifically React, for building user interfaces and analytical dashboards.
Experience in creating analytical dashboards and reports using BI tools (e.g., Tableau, Power BI, Looker) or custom front-end solutions.
Strong SQL skills and experience with query optimization.
Familiarity with version control systems (e.g., Git) and CI/CD practices.
Excellent problem-solving, analytical, and troubleshooting skills.
Strong communication and collaboration abilities.
Preferred Qualifications:
Experience with other programming languages such as Java or Scala.
Deep expertise in multiple cloud platforms (e.g., AWS, Azure, GCP) and their diverse data services.
Knowledge of containerization technologies (e.g., Docker, Kubernetes).
Experience with streaming data technologies (e.g., Kafka, Kinesis).
Understanding of machine learning concepts and experience supporting MLOps.
Contributions to open-source data projects.