EXPERIENCE SUMMARY • 8-year work experience as Senior Data scientist for Biotechnology, market research, telecom business, automobile, network security, and 4-years work experience as AI engineer for cloud spark applications in AWS, Azure.. • Recent research and development for applications in anomaly traffic detection in CAN workflows; high level feature extraction using Bayesian variables; object detection and action classification; • Designed and implemented machine learning algorithms for churn prediction and Add a Line analysis and machine learning pipelines for telecom business analysis. • Developed advanced machine learning algorithms for text, POS outlet items, category hierarchy classification; multilabel and multitask classification algorithms/text classification/NLP; • Developed advanced machine learning regression/classification algorithms for food component analysis using chemometrics and spectroscopy; • Postdoctoral research on uniformly and unbiased sampling/crawling online social networks using advanced Markov Chain Monte Carlo techniques; developed an innovative sampling algorithm, a new coupling technique, implemented by Ruby and Rails and Twitter API, DataMapper; Unix/Linux, Amazon EC2; social media analysis using Python, NLTK, SkLearn.
Project I: IBM SPSS python conversion. Implemented end-to-end python pipeline with spark on AWS cloud for the original SPSS stream models. The main tasks include data collection, spss stream python conversion for spss type, filter, select, filler, merge nodes, aggregate nodes, flag nodes supernodes, cache, statistical outputs, with pyspark with spark on AWS cloud platform; unit tests and spark storage plan for optimization. Further, including techniques to transfer to Google Cloud Platform (GCP) services with BigQuery, DataFlow, Pub/Sub, BigTable, Data Fusion, DataProc, Cloud Composer, Cloud SQL, Compute Engine, Cloud Functions, and App Engine; BigQueryML, AutoML, Vertex AI
Developed and implemented Python scripts for data parsing, data imputation, and data encoding using sklearn, pandas. Developed Python scripts to train and build models and to run tests to evaluate system performance of AI solutions, using sklearn, pandas; developed and implemented python scripts to AI solutions for forecast modeling and regression modeling and classification modeling; developed and implemented python scripts for log time series analysis; Developed MLflow for model tracking, training, logging, registration, inference, hyperopt/parameter sweep. Univariate/multivariate forecasting and regression Analyzed and validated business requirements and review of solutions with relevant stakeholders; the technical report for MLFlow project development and production solution for azure cloud AI solution, and anomaly detection and sentinel; the research report for improvement of forecast models for rail transportation with Azure Devops and Databricks Technology: Pyspark, pandas, sklearn, MLflow, Azure databricks, Devops