Azure Migration and ETL
2021
Python
Azure
Apache Spark
Azure Data Factory
Migrated the on-premises services and data pipelines into Azure. Designed and implemented various frameworks for data quality and data movements.
Technologies: PySpark, Azure Data Factory, Azure Data Lake, Python, Azure Databricks
Implemented transformation frameworks that can move data from various sources into Azure Data Lake.
Designed and developed rule based Data Quality Framework.
Document Indexing and Searching
2020
Data pipeline to collect and refresh data from various sources using AWS Services.
Technologies: Python, AWS Glue, AWS Comprehend, AWS Textract, AWS CloudSearch
Retrieved data from various sources and saved them into a data warehouse.
Indexed the documents using NLP techniques and expose the data for fast lookup.