Personal details

Sara F.

Based in: 🇧🇷 Brazil

Timezone: Brasilia (UTC-3)

Summary

I began my career in data in 2016 as a database administrator for a small consulting firm in Brazil. After that experience, I explored other dataoriented roles and ultimately landed in data engineering. Since then, I have been striving to enhance my technical and nontechnical skills. In my view, effective data engineering entails not only constructing data pipelines, but also promoting communication and ensuring that everyone speaks the same data language.

AWS | GCP | Python | Hadoop Ecosystem | ETL | Data Visualization | SQL | Big Data | Linux | Git

Technical skills

Top 3 skills

Python・5 yrs SQL・8 yrs Infrastructure・5 yrs

Other skills

Data Pipelines・7 yrs Amazon Redshift・4 yrs Google Cloud Platform・5 yrs Airflow・4 yrs Kubernetes・5 yrs

Work Experience

Senior Data Platform Engineer

Jusbrasil - | Jan 2022 - Present

Python

MySQL

MongoDB

PostgreSQL

Amazon Redshift

Prometheus

Kafka

I am currently a Senior Data Engineer handling a platform that processes over 100 million daily records and manages 1200+ data sources coming from Big Table, MongoDB, Postgres, MySQL, APIs, Kafka, Netsuite, Zendesk, and more. To support this high demand, I've helped create highly flexible Spark frameworks specialized in data ingestion and storage, configurable to each source via a HOCON configuration file. Beyond that, I have deployed and integrated multiple services in the platform infrastructure, like Airbyte, Consul, and Opentelemetry server, using them to provide integration with data sources, increasing the stability and maintainability of the platform, and together with Prometheus and Grafana, creating an observability infrastructure that allows real-time monitoring for all platform's elements.

Main projects:

- Deployment of tools like Airbyte, Prometheus, Grafana and custom services.

- Implemented Consul in both frameworks, alongside with its infrastructure in Kubernetes.

- Creation of data pipelines with multiple origins: API, MongoDB, BigTable, MySQL.

- Rearchitecture of all infrastructure to run jobs, from Spark-on-k8s to Dataproc.

- Creation of a monitoring ecosystem using OpenTelemetry.

Data Engineer

Hash | Nov 2020 - Jan 2022

Python

ETL

Apache Spark

Kubernetes

I worked in a small Data Platform team, building a scalable and automatized platform that can keep up with the company's growth. In 2020 the company grew 10x, and in 2021 we expected a 20x increase, which resulted in technical and cultural challenges. That's why beyond building Looker models and views, ETL jobs in PySpark, and maintaining the legacy infrastructure, I also did an active job of promoting a data culture through analytics classes for other teams.

Main projects:

- Airflow in a Kubernetes infrastructure.

- Datalake layers structuring.

- Debezium deploy.

- Ingestion API.

- Metrics API.

Education

Universidade Federal Fluminense

Bachelor's degree・Computing Systems

Mar 2016 - Dec 2019