The role
Most of your time will be spent making our existing software more reliable. You'll dig into production bugs, trace them to root cause, and write the tests that should have caught them. You'll review pull requests in a way that raises the bar without demoralizing anyone. You'll tackle the tech debt that everyone complains about in retros but nobody picks up.
The other part — and this is growing — is our AI agent work. We're building tooling, orchestration, and evaluation systems for LLM-powered features. This isn't a research project. It's production software that needs the same rigor as everything else we ship. We need someone who can bring that rigor.
What you'll do
- Debug and fix issues across the stack, from staging through production
- Build meaningful test coverage (the kind that actually catches regressions)
- Review code thoroughly — with context, not just style nits
- Refactor the parts of the codebase that slow the team down
- Help define how we ship: observability, release process, quality gates
- Work with product and design to land features cleanly
- Contribute to our AI agent infrastructure: tool-calling, orchestration, evaluation, safety
What we're looking for
- 3–6 years of professional software engineering experience
- Real debugging experience in distributed systems — you've traced issues across services, not just stepped through a single process
- A habit of writing tests, ideally born from experience rather than mandate
- Thoughtful code review skills — you give feedback people learn from
- Comfort working in an existing codebase you didn't write
- Working understanding of LLM basics: prompting, context windows, hallucination, evaluation
- Clear communication with both technical and non-technical teammates
Nice to have
- Hands-on experience with AI agents, tool-calling, or workflow orchestration
- Familiarity with RAG, embeddings, vector databases, or model eval pipelines
- Observability chops (logs, metrics, tracing, incident response)
- CI/CD experience and opinions about what belongs in a quality gate
- A history of writing internal documentation that people actually reference