Personal details

David B.

Timezone: Amsterdam (UTC+2)

Summary

From very early in my career I have been teaching multiple people (Senior, Medior and Junior devs) how to do things. This has been possible thanks to my ability to teach complex subjects and concepts in a very foot-on-the-ground way, in a way that combines programming logic and real-life logic.

Learning programming logic, data modeling and general IT logic is something that takes time, practice and the energy to want to generalize and simplify every problem you come across.

With me, you will be able to understand the details of deployments, CI/CD, Docker and any programming language and be able to remember it, thanks to my teaching methods.

Skills and tools as of 10/08/2022 (4 years working experience):

Data Storage/Data Lake: Azure Data Lake, Kafka 0.10+, ElasticSearch, OpenDistro.
Data Processing / Visualization: Logstash, PySpark, Spark, Kibana.
Databases: PostgreSQL, MariaDB, SQLite, MongoDB, DynamoDB, Cassandra.
Primary Languages: Python (3.+), SQL/T-SQL, C++ (17), Ruby (2.6.2+), JS (ES6+), Ansible Playbooks.
Secondary Languages: C, Java 11 (low experience).
Distributions: CentOS (7/8), Red Hat, Debian, Ubuntu.
Architecture: Nginx, ProxySQL, BDR, Galera, AWS, Azure, Docker, Zookeeper.
Cloud: Azure, AWS.
Other tools: Git, Jenkins, Postman, Ansible Playbooks, SSMS, DBeaver, Kubernetes (low experience), WSL2.

Work Experience

Data Engineer Consultant

Essent | Jul 2022 - Present

Python

SQL

PostgreSQL

Spark streaming

Go (Golang)

AWS (Amazon Web Services)

New at Essent.

Data Engineer

iVent Mobile | Jan 2021 - Jul 2022

Ruby

Python

SQL

Ruby on Rails

Git

Nginx

Elasticsearch

GitLab

Docker

Apache Kafka

- Deployed a GEO Redundant Kafka Cluster (8 nodes) in 4 POPs around the world, set up MM2 redundancy for real time data streaming. This cluster supports data from over 2500 customer planes, around 6.5 million documents every 5 minutes. (Kafka, Centos7) - Set up and maintained a 9 node Galera cluster and proxySQL for Observium (NMS System). This system is now in charge of monitoring over 2500 VMs in 4 different data centers in the world and provides critical status updates to hundreds of support engineers. (PostgreSQL, Observium, ProxySQL, MariaDB, Galera) - Worked on data parsing, processing, analyzing, and visualizing. Created and combined several internal and external data sources to enrich our client's datasets. (Ruby, Rails, ElasticSearch, Kibana, PostgreSQL, Kafka). - Designed and implemented a Data Model capable of supporting multiple Mobile Technologies and their metadata, regardless of source and data structure. This cut implementation times by over 4 hours per each Technology and standarized all sources. (PostgreSQL, Rails) - Designed a Spark cluster (Spark, PySpark, ZooKeeper) (6 nodes) which ingests data from data collectors located in the 3 world POPs. Provided near real time availability, 100% data consistency, greater ingest, processing and writing speed using map reduce and data micro batching streaming. (Spark, PySpark, ZooKeeper, Python, Ansible, Kafka). - Designed and developed a Geo Redundancy and Fail Over system for Delayed Jobs, this made all our data processors redundant and fail safe. It also allowed us to have 4 hosts with Delayed Jobs workers in 4 locations around the world. (Ruby, PostgreSQL, Rails) - Maintained and set up redundant, internet facing Kong Proxy, 4 VMs with custom Lua plugins and 2 custom PostgreSQL clusters (2 nodes) with versions 9 and 10 in a master-slave infrastructure. (Kong, Ansible, Lua, Keepalived, Centos 7, Yum).

Personal Projects

Pypdfy

2019

Python

Regex

Data Analysis

Pypdfy is a package that provides a set of tools to analyse PDF structures.

Confidential Project

2019

Python

SQL

XML

Automation

Algorithm

React

Autocad

Data parsing

Personal details

David B.

Summary

Technical skills

Work Experience

Personal Projects