I am a Senior AWS Data Engineer with expertise in Big Data, NoSQL, databases, programming, cloud services, and tools. I have led successful projects in data engineering and machine learning.
As part of data practice, responsible for assessing applications and identify data lineage and requirements to defining the strategy, architecture, implementation plan, and delivery of data centric applications to establish the long - term strategy and short-term scope for a multi-phased Big data applications and cloud data warehouses.
Led the migration of the vehicle manufacturing planning process ETL logic from Glue to EMR with resulting in an annual cost saving of $400,000 and an increase in integration performance by 33%.
Maintained a 95% uptime while ingesting streaming and transactional data across 30 different data sources using PySpark S3 processing billions of raws daily.
Design and develop serverless analytics pipeline on AWS platform using AWS Lambda S3, AWS Glue, PySpark, DynamoDB, Athena.
Developed realtime Ingestion pipelines for connected vehicles data using Confluent Kafka, S3 Sink connectors, Glue and AWS Lambda.
Training BMW internal team on Glue and EMR PySpark reusable blueprints for analytics and machine learning data pipelines.
Designed and built a machine learning model to assist with de-duplicating engineering data pipelines reducing the ingest cost by more than half.
Developed data pipelines for new vehicle electric battery data performance with EMR blueprints, PySpark and SageMaker.
Led a team of 10 Data Scientists as the New Initiatives Lead within ABSA Rest of Africa(R.O.A) Analytics team.
Developed the Customer's Next Best Product recommendation machine learning model using the Apriori algorithm running in a docker container.
Developed Geo analytics on customer and channel's geoinformation to aid in supporting a reduction in the bank's footprint strategy with more informed analytics using Kepler(Uber open-source) for visualisation.