Personal details

João R.

Timezone: Bucharest (UTC+2)

Technical skills

Top 3 skills

Data warehouse・10 yrs SQL・10 yrs Python・2 yrs

Other skills

Database・12 yrs Data Engineering・4 yrs Apache Spark・3 yrs

About

I’m a self-motivated engineer passionate about data warehousing and data engineering, capable of easily adapting to new environments. I enjoy discussions about technology and creative ways to approach complex problems.

I have extensive experience developing data applications across different industry sectors such as finance, telco, retail and marketing. During my career, I have worked with different data architectures and made use of multiple relational databases, the Hadoop ecosystem and cloud infrastructure.

I hold myself to a high level of attention to detail and thoroughness, I believe in a pragmatic view of benefits for each project I participate in.

Work Experience

Senior Analyst, Data Engineer

KNEIP | Dec 2018 - Mar 2020

Java

HBase

Apache Spark

Apache Kafka

Kubernetes

Apache Hadoop

CI/CD

Kafka streams

Apache NiFi

- Developed Kafka Streams applications supporting an event-driven architecture with micro services. - Developed Spark Streaming applications to consume data from Kafka and load a Fund Data Management data model in HBase. - Participated in creating CI/CD pipeline for Kafka Streams applications, migrating to containers and Kubernetes orchestration. - Implemented NiFi processor groups to integrate data sourced from files via FTP. - Participated in data modelling for Fund Data Management.

Senior Developer

Sagacity Solutions | Feb 2017 - Nov 2018

Python

SQL

MySQL

Teradata

Apache Spark

Apache Hadoop

Apache Hive

- Developed a bespoke Value Based Management analytics solution for telecommunications company Telstra. The solution, within data warehouse supported by Teradata, included modules for tenure and cashflow forecasts and also investment data integration. - Designed and developed configuration-driven product for Value Based Management using Apache Spark, standardization of core algorithms. - Supported the implementation of Value Based Management product for telecommunications group Tele2 in three different countries, Estonia, Latvia and Lithuania. - Oversaw Value Based Management product operating in a Software-as-a-Service model using AWS. - Developed ETL to enable a Revenue Assurance process related to call-center operations for telecommunications company TalkTalk. Used data warehouse supported by Netezza.

Projects

KNEIP Digital Platform

2020 | https://www.kneip.com/

Java

HBase

Apache Spark

Apache Kafka

Kubernetes

Apache Hadoop

CI/CD

Kafka streams

Apache NiFi

A complete digital platform for Fund Data Management, capable of handling the entire life cycle of Fund Data, integrating multiple sources and capable of supporting multiple targets for reporting and publishing in different media. I was a senior data engineer within a cross-functional team responsible for supporting real-time data integration from different sources into a data model capable of supporting multiple products. The platform implemented an event-driven architecture with micro services. I was heavily involved in the development of the data ingest pipeline making use of Apache NiFi, Kafka Streams, Apache Spark and HBase.

VBM Product

2018 | https://www.sagacitysolutions.co.uk/what-we-do/value-based-management/

Python

SQL

Apache Spark

Apache Hadoop

VBM stands for ‘Value Based Management’ and it is a solution which allows businesses to improve their profitability by providing detailed customer level insight on which customer delivers the most value. It also looks to create an appropriate and sustainable approach to governance and a culture to focus on long term value creation. I was the lead developer to create a configuration-driven product containing VBM’s core modules, namely tenure and cashflow forecasts and also investment data integration. I also participated in different implementations of this product, delivering client-specific customization and supporting technical deployment in different environments, cloud (AWS) and on-premises Hadoop cluster. The product is written in Python and supported by Apache Spark.