As part of the Data Engineer team, you will be responsible for design, development and operations of large-scale data systems operating at petabytes scale. You will be focusing on real-time data pipelines, streaming analytics, distributed big data and machine learning infrastructure. You will interact with the engineers, product managers, BI developers and architects to provide scalable robust technical solutions.
Experience in agile models
Design, develop, implement and tune large-scale distributed systems and pipelines that process large volume of data; focusing on scalability, low -latency, and fault-tolerance in every system built.
Experience with Java, Python to write data pipelines and data processing layers
Experience in Airflow & Github.
Experience in writing map-reduce jobs.
Demonstrates expertise in writing complex, highly-optimized queries across large data sets
Proven, working expertise with Big Data Technologies Hadoop, Hive, Kafka, Presto, Spark, HBase.
Highly Proficient in SQL.
Experience with Cloud Technologies ( GCP, Azure)
Experience with relational model, memory data stores desirable ( Oracle, Cassandra, Druid)
Provides and supports the implementation and operations of the data pipelines and analytical solutions
Performance tuning experience of systems working with large data sets
Experience in REST API data service – Data Consumption
Retail experience is a huge plus.
Bachelor's degree or engineering degree in one of the following areas:
OR At least 4 years of verifiable experience in roles related to:
null