Personal details

Guru K.

Senior Data Engineer

Based in: 🇮🇳 India

Timezone: Mumbai (UTC+5.5)

Summary

Experienced, result-oriented, resourceful and problem solving Senior Data Engineer with leadership skill. Adapt and accepted challenging releases. Over 10 years of diverse experiences in information Technology field, includes Development and Implementation of various applications in big data and analytics environments.

Skills:

Worked extensively on building ETL data pipelines on Cloud that are well designed, robust, maintainable, tested and optimized for resource utilization using Agile methodology.

• Design and build customized, cutting-edge data engineering solution by experimenting with new technologies while always keeping business priorities in mind.

• Strong in Big Data tech stack like ETL data pipeline, Data Lake, Data Warehouse, Data Mapping, Parsing structured and unstructured data, Hadoop, Spark , SparkSQL, PySpark, Spark Streaming, Databricks and Kafka.

• Programming languages like Python and Scala • AWS Glue, AWS S3, AWS Redshift Data warehouse, AWS Lambda, AWS Kinesis,SNS,SQS, AWS EMR, Cloud watch, ECS.

• CI/CD, RDBMS, Cassandra NoSQL, Jira, Rally, Linux

Technical skills

Top 3 skills

Python・6 yrs SQL・5 yrs Apache Spark・6 yrs

Other skills

Apache Kafka・2 yrs Aws glue・6 yrs Amazon Redshift・5 yrs ETL・9 yrs Data Warehouse・5 yrs Airflow・3 yrs Data Modelling・3 yrs

Work Experience

Senior Data Engineer

Cognizant | Aug 2022 - Present

Apache Spark

Amazon Redshift

AWS (Amazon Web Services)

Working for product monitoring log aggregation project which takes batch and streaming sources and creating data pipeline on AWS cloud by designing, developing and implementing cost effective end to end ETL architecture and working along with BI and analytics team.

Responsibilities:

• Design and develop AWS Glue PySpark jobs which take input from S3 and applied transformations and store in Redshift tables.

• Designed and Developed data lake for log data analytics on AWS cloud which gets ingestion of csv, json file sources and db sources.

• Developed real-time data ingestion layer using AWS Kinesis streaming, PySpark and S3.

• Data cleansing and curation to make data available for BI and Analytics team.

• Managed Spark jobs using Glue workflow scheduler system.

• Github used for code repository. • CI/CD using AWS CodePipeline.

Senior Data Engineer

General Electric (GE) | Aug 2019 - Jul 2022

MySQL

Apache Spark

Amazon Redshift

Apache Kafka

Software Architecture

AWS (Amazon Web Services)

Worked on User data analytics which involves migrating existing Talend data processing system to AWS cloud with cost-effective architecture.

Responsibilities:

• Design and develop AWS Glue PySpark jobs which takes input from S3 and MySQL, applied transformations and store processed data in Redshift tables and Spectrum tables.

• Designed and Developed Data Lake for Log data analytics on AWS cloud.

• Worked on Data mapping for Talend migration to AWS cloud.

• Worked on building Redshift Data warehouse for Log Analytics.

• Developed real-time data ingestion layer using Kafka, AWS EC2, AWS Glue streaming, SNS and S3.

• Parsed csv, xml, json and log files using Gule and PySpark at the time of ingestion.

• Github used for code repository.

• Orchestration with Apache Airflow.

• CI/CD using AWS Code Pipeline.

Education

Visvesvaraya Technological University (VTU), Belgaum, India

Bachelor's degree・Computer Science and Engineering

Aug 2002 - Sep 2007