Actively recruiting / 12 applicants
We’re here to help you
Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email
Juliana Torrisi, RecruiterSenior Kafka / Mobile Platform Engineer
About the role
Own our event-driven architecture and streaming platform. You’ll design, operate, and scale Kafka-based pipelines that power multi-region microservices and real-time features for a high-traffic mobile product.
Responsibilities
- Design topic topology, partitioning, keys, retention, and DLQ strategies for high scale.
- Build and operate Kafka clusters (AWS MSK or self-managed on EKS); plan capacity, scaling, DR, and multi-region replication (e.g., MirrorMaker 2).
- Implement robust producers/consumers (idempotency, exactly-once/at-least-once semantics, retries, backoff).
- Establish data contracts & schema governance (Avro/Protobuf + Schema Registry); enforce versioning.
- Create streaming pipelines and connectors (e.g., Debezium CDC from PostgreSQL/Aurora) and real-time processing (ksqlDB/Flink/Spark Streaming—nice to have).
- Set up observability (Prometheus, Grafana, ELK) with SLOs, alerting, runbooks, and incident response.
- Partner with backend teams (NestJS services) on event models, API boundaries, and performance.
- Harden security (ACLs/RBAC, encryption, secrets) and optimize cost/perf in AWS.
- Contribute to CI/CD for streaming apps and connectors; blue/green or canary rollouts.
Requirements
- 5+ years building distributed systems; 3+ years running Kafka in production at scale.
- Deep knowledge of Kafka internals (brokers, ZK/KRaft, consumer groups, offsets, compaction).
- Experience with AWS (MSK, IAM, networking, multi-region patterns) and Kubernetes (EKS).
- Strong in one or more languages for stream apps (TypeScript/Node, Java, Go, or Python).
- Solid PostgreSQL/Aurora understanding (esp. for CDC and downstream consumption).
- Excellent debugging in distributed environments and performance tuning.
Nice to Have
- ksqlDB/Flink/Spark Streaming; Kafka Connect at scale.
- IaC (Terraform), GitOps, and service mesh (Istio/Linkerd).
- Integrating Lambda in streaming workflows; familiarity with GPU workloads (Runpod) is a plus.
- GraphQL familiarity for downstream consumers.