Working with National Grid and other academic consultants, delivered a study on projected mid-term security of supply indices for the UK power system in a range of decarbonisation scenarios


Statistical consultant

developed an end-to-end machine learning pipeline for student failure prediction intended as a new data service


ML consultant

* Created data modelling and visualisation products on top of the Twitter Firehose to forecast the reach of influencer-based marketing campaigns on different consumer segments.
* Helped expand the company’s catalogue of data analytics services through the development of machine learning pipelines for the inference of sociodemographic features of social media users.
* Used NLP techniques and network models on a regular basis to answer customer-specific analytics questions regarding public perception and conversation on social media for customers like Coca-Cola and the Mexican Presidential Office.
* Implemented network community detection algorithms from the academic literature to improve the catalogue of analytics services



Lead Data Scientist

* As a part of the data science team improved the accuracy of credit scoring algorithms by 4%
* Developed weekly KPI forecasting reports for executive-level managers


Data Modelling Scientist

The projects I've worked on range from Finance to Marketing and Education. You can see some of them at www.nestorsag.com.
I usually work with R, Python, Scala, MySQL, Linux and Git but I'm also comfortable with big data frameworks such as Spark, MongoDB and AWS/Google Cloud.

I've used Scala mainly to write Spark programs using Spark's  MLlib; more recently, I used Scala and Spark for my MSc dissertation to develop a library with an implementation of Gaussian Mixtures Model amenable for stochastic optimisation and streaming data. You can find it in https://github.com/nestorSag/streaming-gmm

I'm a very experienced R user; some of the stuff I've done with it includes fitting Machine Learning models, Text Mining, network modelling and analysis, data visualisation, web scrapping, and huge amounts of data cleaning.

I've used Python for 4 years now, mainly to do data cleaning and to deploy small web applications on Google Cloud using Flask; more recently I've used it to develop some Machine Learning models using Pytorch and sklearn.

My Applied Mathematics background has a heavy Statistics component that I expanded in the MSc and plan to keep expandin on the PhD; this includes modelling approaches that might be very useful with data that have special structure and that vanilla Machine Learning models can't directly handle, like Generalized Linear Models for ordinal and integer outcomes, or time series models for sequential data; non-parametric approaches that allows to fit highly non-linear models with very few observations. Using statistical theory along with other tools generally enhances interpretability and model accuracy.

Machine Learning Engineer

Nestor Y. Sanchez

A Gaussian Mixture (GM) is a popular clustering model that is usually  fitted using the Expectation-Maximization (EM) algorithm. This makes the  model difficult to scale since EM is a batch algorithm, not suited for  very large datasets or data streams. Taking this paper  as starting point, in this project I developed a Scala library that  implements accelerated stochastic gradient descent GMMs, which solves  the problems mentioned above and achieves fast convergence; it can run  sequentially or in parallel using Spark.

Fitting Large-Scale Gaussian Mixtures With Accelerated Gradient Descent

Python package implementing a GCP-based end-to-end machine learning pipeline for generative deep learning models using typeface data. It uses Beam for data preprocessing, Tensorflow for model training and MLFlow for experiment tracking.


Generative deep learning models for text font generation

Automated bootstrapping of ephemeral, highly available Kubernetes clusters from the ground up on AWS's EC2 using mainly Ansible, Terraform and GNU Make

Automated bootstrapping of Kubernetes cluster on EC2

Python package implementing univariate and bivariate parametric extreme value models and optimized functionality for power system adequacy assessment.

Package for extreme value modelling in Python

Python package implementing deep reinforcement learning algorithms using pytorch and OpenAI's gym library.

Personal details

Nestor S.

Summary

Technical skills

Work Experience

Education

Personal Projects

Certifications & Awards