Personal details

Carlos G. - Remote data scientist

Carlos G.

Based in: 🇫🇮 Finland
Timezone: Helsinki (UTC+3)

Summary

Over 10 years of experience building data systems with machine learning, NLP, statistical modelling, mathematical optimisation and data visualisation.

Work Experience

Data Engineer/Scientist
Perceptive Constructs | Aug 2010 - Present
Python
Java
C++
Scala
Node.js
PostgreSQL
Azure
Boost
NumPy
D3.js
Pandas
Redis
Machine Learning
Scipy
Google BigQuery
Data Science
NLP (Natural Language Processing)
Google Cloud Platform
Data Visualization
Tableau
JavaScript
Rust
Apache Spark
Amazon Redshift
Apache Kafka
Kubernetes
Deep Learning
Spark streaming
Terraform
TensorFlow
Grafana
Spacy
Gurobi
Recommendation systems
Data Engineering
Looker
Airflow
Prometheus
Data Architecture
Mathematical Optimization
PyTorch
Python Pulp
MLflow
Fastapi
Faiss
Delta Lake
Huggingface
AI (artificial intelligence)
AWS (Amazon Web Services)
Scikit-learn
Langchain
Gradio

I built a number of systems, integrated in customer product development teams. Most notably:

At Staq.io, a Fintech startup (GCP, RunPod):

  • AI prototype to assist compliance officers. End-to-end NLP pipeline from PDF data
    extraction and transformation, through model fine-tuning, to RaG and prototype UI.

At Intel471, a Cybersecurity company (AWS):

  • Multilingual NLP pipeline for knowledge extraction from discussion forums. Data curation, extraction, transformation and annotation. Model fine tuning, custom decoding, evaluation and deployment. Data visualisation.

At Bank Al Etihad, a Retail Bank (AWS + On-Premise):

  • Analytics pipeline and data warehouse. Anonymisation. Streaming data fusion.
  • Bespoke offering recommender from transaction data.
  • Deposit rate optimisation, causal estimation. Churn, loan, and card risk models.

At Verto Analytics, a Behavioural Analytics startup (AWS):

  • Processing pipeline optimisation
  • Workflow orchestration for deliveries.

At MarkaVIP, an E-commerce startup emphasizing Flash Sales (AWS):

  • Bespoke low latency recommender
Director of Data Science
MarkaVIP | Aug 2015 - Aug 2016
Python
MySQL
D3.js
Data Science
Data Visualization
JavaScript
Apache Spark
Amazon Redshift
AWS Kinesis
Recommendation systems
Data Engineering
Data Architecture
Python Pulp
Software Architecture
AWS (Amazon Web Services)

My focus was on laying the foundation for Data Science, to enable sustainable results.

Data Engineering achievements:

  • Built and deployed a foundational analytical pipeline. (AWS, Kinesis, Redshift, Spark)
  • Integrated continuous data ingestion from key systems into the pipeline, whenever
    practical through low latency interfaces such as MySQL database replication.
  • Migrated interaction tracking and key analytical systems to the pipeline.

Notable Data Science work:

  • Modelled interventions on customer experience to address returns and cancellations.
    Built a policy optimiser, through retrospective simulation with historical data. (Python,
    Kinesis, Redshift)
  • Improved profitability by optimising basket constraints and incentives. (Python/PuLP)
  • Performed retrospective sourcing performance and pricing analysis. (Python)

Education

Universidade Nova de Lisboa
Master's degreeComputer Science
Sep 1991 - Jul 1996

Certifications & Awards

Google Cybersecurity Specialization
Coursera | Oct 2003