Data pipeline using snowflake and airflow
2022
Amazon EC2
Snowflake
Apache Airflow
AWS (Amazon Web Services)
- Transfer data logs from AWS EC2 to AWS S3 using AWS KINESIS
- Making ingestion data pipeline with Apache Airflow DAG using python and AWS Managed Airflow (MWAA)
- Loading data to Snwoflake data warehouse
- Making all role management in AWS IAM and Snowflake
Yelp Dataset Analysis with Spark on Azure Databricks:
2022
JSON
Apache Spark
Databricks
Azure Data Factory
- Created resource manager followed by storage account, the containers to upload the dataset
- Created pipeline to copy data from Azure storage to Azure data lake storage ADLS using data factory
- Created Azure Databriks workspace and cluster, access and configure ADLS from Databriks
- Converted Data file format from JSON to Parquet and then to Delta format for smooth analysis
- Upload dataset to Spark data frame format