Personal details

David B. - Remote data engineer

David B.

Timezone: Amsterdam (UTC+2)

Summary

From very early in my career I have been teaching multiple people (Senior, Medior and Junior devs) how to do things. This has been possible thanks to my ability to teach complex subjects and concepts in a very foot-on-the-ground way, in a way that combines programming logic and real-life logic.

Learning programming logic, data modeling and general IT logic is something that takes time, practice and the energy to want to generalize and simplify every problem you come across.

With me, you will be able to understand the details of deployments, CI/CD, Docker and any programming language and be able to remember it, thanks to my teaching methods.

Skills and tools as of 10/08/2022 (4 years working experience):

Data Storage/Data Lake: Azure Data Lake, Kafka 0.10+, ElasticSearch, OpenDistro.
Data Processing / Visualization: Logstash, PySpark, Spark, Kibana.
Databases: PostgreSQL, MariaDB, SQLite, MongoDB, DynamoDB, Cassandra.
Primary Languages: Python (3.+), SQL/T-SQL, C++ (17), Ruby (2.6.2+), JS (ES6+), Ansible Playbooks.
Secondary Languages: C, Java 11 (low experience).
Distributions: CentOS (7/8), Red Hat, Debian, Ubuntu.
Architecture: Nginx, ProxySQL, BDR, Galera, AWS, Azure, Docker, Zookeeper.
Cloud: Azure, AWS.
Other tools: Git, Jenkins, Postman, Ansible Playbooks, SSMS, DBeaver, Kubernetes (low experience), WSL2.

Work Experience

Data Engineer Consultant
Essent | Jul 2022 - Present
Python
SQL
PostgreSQL
Spark streaming
Go (Golang)
AWS (Amazon Web Services)
New at Essent.
Data Engineer
iVent Mobile | Jan 2021 - Jul 2022
Ruby
Python
SQL
Ruby on Rails
Git
Nginx
Elasticsearch
GitLab
Docker
Apache Kafka
- Deployed a GEO Redundant Kafka Cluster (8 nodes) in 4 POPs around the world, set up MM2 redundancy for real time data streaming. This cluster supports data from over 2500 customer planes, around 6.5 million documents every 5 minutes. (Kafka, Centos7) - Set up and maintained a 9 node Galera cluster and proxySQL for Observium (NMS System). This system is now in charge of monitoring over 2500 VMs in 4 different data centers in the world and provides critical status updates to hundreds of support engineers. (PostgreSQL, Observium, ProxySQL, MariaDB, Galera) - Worked on data parsing, processing, analyzing, and visualizing. Created and combined several internal and external data sources to enrich our client's datasets. (Ruby, Rails, ElasticSearch, Kibana, PostgreSQL, Kafka). - Designed and implemented a Data Model capable of supporting multiple Mobile Technologies and their metadata, regardless of source and data structure. This cut implementation times by over 4 hours per each Technology and standarized all sources. (PostgreSQL, Rails) - Designed a Spark cluster (Spark, PySpark, ZooKeeper) (6 nodes) which ingests data from data collectors located in the 3 world POPs. Provided near real time availability, 100% data consistency, greater ingest, processing and writing speed using map reduce and data micro batching streaming. (Spark, PySpark, ZooKeeper, Python, Ansible, Kafka). - Designed and developed a Geo Redundancy and Fail Over system for Delayed Jobs, this made all our data processors redundant and fail safe. It also allowed us to have 4 hosts with Delayed Jobs workers in 4 locations around the world. (Ruby, PostgreSQL, Rails) - Maintained and set up redundant, internet facing Kong Proxy, 4 VMs with custom Lua plugins and 2 custom PostgreSQL clusters (2 nodes) with versions 9 and 10 in a master-slave infrastructure. (Kong, Ansible, Lua, Keepalived, Centos 7, Yum).

Personal Projects

Pypdfy
2019
Python
Regex
Data Analysis
Pypdfy is a package that provides a set of tools to analyse PDF structures.
Confidential Project
2019
Python
SQL
XML
Automation
Algorithm
React
Autocad
Data parsing