Personal details

Lim H. - Remote data engineer

Lim H.

Data Engineer (Remote)
Based in: 🇲🇾 Malaysia
Timezone: Kuala Lumpur (UTC+8)

Summary

I have over 14 years of experience in the IT industry, primarily focused on developing systems using .NET (C#) and MSSQL. Throughout my career, I have successfully designed and implemented various systems such as Equipment Management System for the semiconductor industry, EDI programs for container and vessel movement in logistics, a Survey System for a Club House, an Online Purchasing System for a Club House, a Trading Software using MQL4, an ERP for distributors, clinics and plantations, a Properties Management System, a Facilities Booking System, and a Queue Management System for the food and beverage industry.

Currently, I am working as a data engineer for a multinational company. My strengths lie in my solution-oriented mindset and the ability to continuously enhance my knowledge of the latest IT technologies and tools. This allows me to better serve my clients by providing them with innovative and efficient solutions. I am also a web scraping expert who has completed more than 100+ web scraping and data sourcing projects from easy to hard complexity.

Work Experience

Data Engineer (Remote)
Glints | Aug 2022 - Present
Python
SQL
PostgreSQL
Web Scraping
Google BigQuery
Zapier
Airflow
Google Cloud Run

Completed 100+ web scraping and external data sourcing projects for lead generation and competitor analysis purposes.

Designed and executed data unification and deduplication initiatives utilizing tools such as BigQuery, Airflow, and Postgresql to ensure data accuracy and consistency across platforms.

Integrated web scrapers into Airflow as a DAG to enable regular data refreshing on a fixed cadence, contributing to streamlined data workflows and improved operational efficiency.

Configured and implemented data integration protocols, including API and webhook setups, to seamlessly incorporate information from diverse third-party cloud systems into a centralized data lake or BigQuery infrastructure.

Engineered and deployed Kafka connectors to enable real-time data streaming from Postgres databases to BigQuery, facilitating seamless and continuous data flow between the systems.

Collaborated with a multinational data team to provide technical consultancy, merge request (MR) review, and knowledge sharing, fostering a collaborative and supportive working environment.

Leveraging cutting-edge technologies such as OpenAI GPT, Zapier, Customer.io, Respond.io, and others, I took a proactive approach to automate lead outreach business operations. This strategic implementation resulted in significant reductions in latency and enabled the recruitment team to optimize their efforts in the outreach process.

Utilizing RPA (UiPath) to proficiently design and orchestrate robots for automating repetitive processes within HubSpot.

Senior IT Systems Analyst (Remote)
Lifewood Data Technologies | Oct 2020 - Aug 2022
Python
SQL
MySQL
Web Scraping
Airflow

Managing a team of multinational developers.
Leading in system architecture, design, research, POC & implementation.
Support & mentoring junior programmers when necessary. (Stopper, bugs, business processes, etc...)

Data Engineering
- Deployed Greenplum as a data warehouse.
- Deployed Airflow 1.10.9 (Dockerized) as a core ETL tool.
- Raw data collection from multiple local & cloud servers via Airflow.
- Implemented reusable custom plugins in Airflow (Python).
- Implemented web scrapers (Selenium, Beautifulsoup) as dag in Airflow.
- More than 140 active dags running daily to synchronize data into the data warehouse.

Low code platform implementation & development (Joget).
- Data collection purpose
- Simple ERP system (For e-Commerce)

Named Entity Recognition (NER)
- Using Spacy v2.3 to train our own model
- Automatic highlights the suggestion entity on the given document to increase production efficiency.

Web Crawling
- Client requirements collection and planning system design with programmers in Python
- AWS compute instance for code execution & S3 bucket for data storage.

OCR services (Python)
- Integration with Google Vision, paddle OCR (CPU & GPU version) to support production.

IoT Project (Automated Delivery Vehicle)
- Developed backend system by using Flask.
- Managing developers on frontend development and the integration with backend and MQTT.
- Deal with the client for the project progress update and feedback.
- Deploy the solution in Docker & Docker Compose.

DevOps
- Implemented CI/CD practice for system developments by using Drone.io, BitBucket, AWS EC2, Dingtalk & Docker.

Work with several cloud services (Instances, API, Storage)
- AWS (Lambda, Transcribe, S3)
- Google Cloud Platform (Vision, Speech-To-Text, Cloud Storage)

Education

Tunku Abdul Rahman College
Bachelor's degree・Information Systems Engineering
May 2008 - May 2010

Personal Projects

Web Scraping
2022
Python
Web Scraping
Freelance projects.