CLI Test and Integration Engineer - Perm - US/UK/Western Europe

Location

Remote restrictions apply

See all remote locations

Salary

US$130K - 180K

Min. experience

5+ years

Required skills

Full-time role

Posted a month ago

Apply now

Actively recruiting / 60 applicants

We’re here to help you

Sole is in direct contact with the company and can answer any questions you may have. Email

Sole, Recruiter

CLI Test and Integration Engineer (Chaos testing, integration tests)
Full-Time · Remote or Hybrid · High-Impact Role

About Odyn

Odyn is at the forefront of AI innovation, building transformative AI solutions through cutting-edge,
high-performance infrastructure. We're seeking a CLI Test and Integration Engineer to design and
execute chaos engineering experiments, build integration test suites, and ensure our GPU
infrastructure withstands real-world failure scenarios.

What You'll Do

Chaos Engineering & Testing
● Build hypothesis-driven chaos experiments using Gremlin, Chaos Monkey, LitmusChaos, or
AWS FIS to inject controlled failures across GPU infrastructure, schedulers, API gateways,
and storage layers.
● Design automated integration tests for distributed AI infrastructure components and end-to-
end workflows.
● Build CLI testing frameworks for developer and operator tools, validating behavior across
environments and edge cases.
CI/CD & System Validation
● Embed chaos, integration, and CLI tests into CI/CD pipelines (GitHub Actions, GitLab CI,
ArgoCD, Jenkins) with intelligent orchestration and automated rollback.
● Test platform behavior under network partitions, node failures, high-load scenarios, and
degraded performance.
● Validate failover mechanisms, data replication, and observability systems during failures.
Collaboration &Culture
● Partner with SRE, infrastructure, and backend teams to improve system resilience and
testability.
● Conduct architecture reviews to identify weaknesses and create incident response
documentation.

What We're Looking For

Must-Have

● 5–7+ years in test automation, chaos engineering, SRE, or distributed systems testing.
● Hands-on chaos engineering experience (Gremlin, Chaos Monkey, LitmusChaos, AWS FIS).
● Strong integration testing experience with distributed systems and cloud-native architectures.
● Proficiency in Python and/or Go; deep experience with pytest, Robot Framework, Playwright,
or similar.
● Kubernetes expertise and cloud platform experience (AWS/GCP/Azure).
● CI/CD pipeline integration and strong Linux/Unix skills.

Nice-to-Have

● GPU workload, HPC, or AI/ML infrastructure testing experience.
● High-performance networking (InfiniBand, RoCE, NVLink) or GPU schedulers (Kubernetes,
Slurm, Ray).
● Observability stacks (Prometheus, Grafana, OpenTelemetry) or infrastructure-as-code
(Terraform, Ansible).
● Prior experience at Netflix, Google, AWS, or AI infrastructure startups.

Why Join Us

● Shape reliability of a cutting-edge AI infrastructure platform from the ground up.
● Work at the frontier of chaos engineering applied to GPU infrastructure and distributed AI
systems.
● Collaborate with world-class SRE and infrastructure teams.
● Competitive compensation + remote flexibility.

We strongly encourage applications from those with chaos engineering or distributed systems testing
experience for GPU clusters, Kubernetes, or AI/ML platforms.