Personal details

Guru K. - Remote data engineer

Guru K.

Senior Data Engineer
Based in: 🇮🇳 India
Timezone: Mumbai (UTC+5.5)

Summary

Experienced, result-oriented, resourceful and problem solving Senior Data Engineer with leadership skill. Adapt and accepted challenging releases. Over 10 years of diverse experiences in information Technology field, includes Development and Implementation of various applications in big data and analytics environments.

Skills:

Worked extensively on building ETL data pipelines on Cloud that are well designed, robust, maintainable, tested and optimized for resource utilization using Agile methodology.

• Design and build customized, cutting-edge data engineering solution by experimenting with new technologies while always keeping business priorities in mind.

• Strong in Big Data tech stack like ETL data pipeline, Data Lake, Data Warehouse, Data Mapping, Parsing structured and unstructured data, Hadoop, Spark , SparkSQL, PySpark, Spark Streaming, Databricks and Kafka.

• Programming languages like Python and Scala • AWS Glue, AWS S3, AWS Redshift Data warehouse, AWS Lambda, AWS Kinesis,SNS,SQS, AWS EMR, Cloud watch, ECS.

• CI/CD, RDBMS, Cassandra NoSQL, Jira, Rally, Linux

Work Experience

Senior Data Engineer
Cognizant | Aug 2022 - Present
Apache Spark
Amazon Redshift
AWS (Amazon Web Services)

Working for product monitoring log aggregation project which takes batch and streaming sources and creating data pipeline on AWS cloud by designing, developing and implementing cost effective end to end ETL architecture and working along with BI and analytics team.

Responsibilities:

• Design and develop AWS Glue PySpark jobs which take input from S3 and applied transformations and store in Redshift tables.

• Designed and Developed data lake for log data analytics on AWS cloud which gets ingestion of csv, json file sources and db sources.

• Developed real-time data ingestion layer using AWS Kinesis streaming, PySpark and S3.

• Data cleansing and curation to make data available for BI and Analytics team.

• Managed Spark jobs using Glue workflow scheduler system.

• Github used for code repository. • CI/CD using AWS CodePipeline.

Senior Data Engineer
General Electric (GE) | Aug 2019 - Jul 2022
MySQL
Apache Spark
Amazon Redshift
Apache Kafka
Software Architecture
AWS (Amazon Web Services)

Worked on User data analytics which involves migrating existing Talend data processing system to AWS cloud with cost-effective architecture.

Responsibilities:

• Design and develop AWS Glue PySpark jobs which takes input from S3 and MySQL, applied transformations and store processed data in Redshift tables and Spectrum tables.

• Designed and Developed Data Lake for Log data analytics on AWS cloud.

• Worked on Data mapping for Talend migration to AWS cloud.

• Worked on building Redshift Data warehouse for Log Analytics.

• Developed real-time data ingestion layer using Kafka, AWS EC2, AWS Glue streaming, SNS and S3.

• Parsed csv, xml, json and log files using Gule and PySpark at the time of ingestion.

• Github used for code repository.

• Orchestration with Apache Airflow.

• CI/CD using AWS Code Pipeline.

Education

Visvesvaraya Technological University (VTU), Belgaum, India
Bachelor's degree・Computer Science and Engineering
Aug 2002 - Sep 2007