Personal details

Sara F. - Remote data engineer

Sara F.

Based in: đŸ‡§đŸ‡· Brazil
Timezone: Brasilia (UTC-3)

Summary

I began my career in data in 2016 as a database administrator for a small consulting firm in Brazil. After that experience, I explored other dataoriented roles and ultimately landed in data engineering. Since then, I have been striving to enhance my technical and nontechnical skills. In my view, effective data engineering entails not only constructing data pipelines, but also promoting communication and ensuring that everyone speaks the same data language.

AWS | GCP | Python | Hadoop Ecosystem | ETL | Data Visualization | SQL | Big Data | Linux | Git

Work Experience

Senior Data Platform Engineer
Jusbrasil - | Jan 2022 - Present
Python
MySQL
MongoDB
PostgreSQL
Amazon Redshift
Prometheus
Kafka

I am currently a Senior Data Engineer handling a platform that processes over 100 million daily records and manages 1200+ data sources coming from Big Table, MongoDB, Postgres, MySQL, APIs, Kafka, Netsuite, Zendesk, and more. To support this high demand, I've helped create highly flexible Spark frameworks specialized in data ingestion and storage, configurable to each source via a HOCON configuration file. Beyond that, I have deployed and integrated multiple services in the platform infrastructure, like Airbyte, Consul, and Opentelemetry server, using them to provide integration with data sources, increasing the stability and maintainability of the platform, and together with Prometheus and Grafana, creating an observability infrastructure that allows real-time monitoring for all platform's elements.

Main projects:

- Deployment of tools like Airbyte, Prometheus, Grafana and custom services.

- Implemented Consul in both frameworks, alongside with its infrastructure in Kubernetes.

- Creation of data pipelines with multiple origins: API, MongoDB, BigTable, MySQL.

- Rearchitecture of all infrastructure to run jobs, from Spark-on-k8s to Dataproc.

- Creation of a monitoring ecosystem using OpenTelemetry.

Data Engineer
Hash | Nov 2020 - Jan 2022
Python
ETL
Apache Spark
Kubernetes

I worked in a small Data Platform team, building a scalable and automatized platform that can keep up with the company's growth. In 2020 the company grew 10x, and in 2021 we expected a 20x increase, which resulted in technical and cultural challenges. That's why beyond building Looker models and views, ETL jobs in PySpark, and maintaining the legacy infrastructure, I also did an active job of promoting a data culture through analytics classes for other teams.

Main projects:

- Airflow in a Kubernetes infrastructure.

- Datalake layers structuring.

- Debezium deploy.

- Ingestion API.

- Metrics API.

Education

Universidade Federal Fluminense
Bachelor's degree・Computing Systems
Mar 2016 - Dec 2019