Personal details

Guangming L. - Remote data scientist

Guangming L.

Chief Data Scientist
Based in: 🇹🇭 Thailand
Timezone: Bangkok (UTC+7)

About

I'm a seasoned data scientist with more than ten years of experience. I specialize in creatively applying modern data science technologies and machine learning algorithms to discover actionable insights from big data that impact the bottom line. For the past ten years, I have worked with more than 150 clients and earned over $400K on Upwork alone and maintained a job success score above 95% at all times. And I have been an expert-vetted top 1% data scientist on Upwork for more than eight years.

Work Experience

Owner and Data Scientist
Cabaceo LLC | Oct 2013 - Present
Python
SQL
R
Machine learning
Data Science
  • Started the business. Scoped projects. Worked with 150+ clients and maintained a job success scoreabove 95% at all times.
  • Recruited and managed a team of 2 to 5 data analysts, statisticians, and developers.
  • Analyzed data from various industries, including FinTech, financial services, eCommerce, retail, edu-cation, medicine & biotech, marketing & advertising, and HR & Operations.
  • Performed exploratory data analysis in Python using pandas, numpy, and matplotlib and in R using tidyverse and ezplot. Applied k-means, hiearchical clustering, principal component analysis, and other unsupervised machine learning methods as well as missing value imputation methods.
  • Performed statistical data analysis and inference in R using a variety of methods that include AB testing, generalized estimating equations, linear mixed models, panel data methods, survival analysis, causal inference, and Bayesian methods.
  • Built, validated, tested, and deployed predictive models in Python (statsmodels, scikit-learn) and R (caret, vtreat) using machine learning algorithms such as regressions (linear, logistic, ridge, and LASSO), support vector machines, decision trees, random forest, XGBoost, time series methods (ARIMA) and neural networks.
  • Built deep learning models in Python tensorflow and keras.
  • Built reports and slides in Rmarkdown.
  • Built dashboards and data apps in R flexdashboard, R Shiny, and Python streamlit.
Chief Data Scientist
Liquid P2P, LLC | Nov 2016 - Nov 2018
Python
R
Machine learning
FinTech
AWS
  • Built, validated, tested, and deployed XGBoost models on AWS to predict the default risk of lendingclub’s P2P loans. My models regularly identified 5% or more high quality notes than lending clubs’s recommended mix across loan grades.
  • Built API services to pass data between database, ML models, and frontend dashboard.
  • Analyzed the life cycle of P2P loans and devised a pricing formula that enabled quick and profitable sales of seasoned notes in the secondary market, making 0.2 to 2% profit per note on average.

Projects

Chatbot as Information Retriever
2023
Python
NLP
Chatbot
OpenAI
AI
I enjoy reading dense articles but I quickly forget what I read. I take notes but it takes forever to find a piece of information from a sea of notes. To solve this problem, I built a chatbot on top of the articles I read using OpenAI's API, llama_index, langchain, and Python Streamlit. And when asked a question, it retrieves answers from those articles.
Payments Data Analysis and GMV (Gross Merchandise Value) Forecasting
Python
Matplotlib
Pandas
Machine learning
Statistics
Data analysis
Data Science
Data Visualization
Data analytics
Performed exploratory data analysis using pandas and matplotlib on a dataset from a Japanese payment processing company. The dataset had 1,582,260 transactions between 9,542 users and 95,460 stores from 1 Jan 2020 to 31 Dec 2021. Forecasted GMV for each user for the month of January 2022 by training XGBoost models iteratively and running walk-forward validation in Python scikit-learn. My models beat the baseline with an average of 34% reduction in walk-forward validation RMSE. Forecasted GMV for the company as a whole for each date in the month of January 2022 by training SARIMAX models iteratively and running walk-forward validation in Python statsmodels. My models had an average validation mean absolute percentage error of 5.12% with worst error of 7.08% and best error of 3.73%.

Education

University of Michigan
Master's degree・Biostatistics
Sep 2009 - Jun 2012
New College of Florida
Bachelor's degree・Mathematics
Sep 2004 - May 2007

Certifications & Awards

National Science Foundation Graduate Research Fellowship
National Science Foundation | Sep 2009