Founding Engineer, Distributed Systems
Location: full remote
Contract: 250 Eur/MD
Language: English proficiency
Availability: ASAP
About the project: An American startup building infrastructure for cost attribution and observability for AI workflows.
Core Responsibilities:
- Data Architecture Ownership: Ingestion, computation, storage, and systems for dashboard performance and billing accuracy.
- System Design: * Ingest high-volume event data from AI workflows.
- Manage duplicates, late arrivals, and partial failures (idempotency).
- Compute costs and perform historical recomputations.
- Aggregate and serve time-based analytics.
- Scale from thousands to millions of workflow runs.
- Maintain strict tenant isolation in shared environments.
- Evolve systems from batch to streaming while maintaining contracts.
Requirements:
Technical Environment:
- Languages: Go, Python (async/concurrency focus).
- Storage: PostgreSQL (multi-tenant SaaS data models).
- Infrastructure: AWS, Auth0, Stripe.
- Potential Future Stack: Streaming systems (Kafka/Kinesis), time-series/OLAP databases, batch processing (Spark).
Preferred Experience:
- Kafka, Kinesis, or other streaming systems.
- Time-series databases or OLAP systems.
- Spark or large-scale recomputation pipelines.
- Familiarity with OpenTelemetry.
Operational Mindset:
- Preference for simple solutions over complexity.
- Focus on failure modes, data contracts, and backward compatibility.
- Ownership-driven approach with a focus on evolutionary design.