Actively recruiting / 33 applicants
We’re here to help you
Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email
Juliana Torrisi, RecruiterRole Overview
We are seeking talented AWS DevOps Engineers to join our team in building a large-scale benchmark that tests the limits of leading AI models. This role is distinct from traditional DevOps positions as it focuses on designing complex, adversarial cloud infrastructure tasks. These tasks will challenge AI agents to solve difficult AWS problems with precision, security, and reliability.
Responsibilities
- Design intricate and adversarial AWS infrastructure tasks to evaluate AI capabilities.
- Develop realistic DevOps scenarios that include security constraints, dependencies, edge cases, and failure conditions.
- Write precise task specifications that define desired infrastructure outcomes.
- Create idempotent reference solutions using Terraform or other AWS Infrastructure-as-Code tools.
- Develop automated graders and validation scripts in Python to assess AI agent performance.
- Validate infrastructure through AWS APIs, CLI outputs, Terraform state, and system behavior.
- Craft tasks that detect incomplete, unsafe, or superficially correct solutions.
- Ensure tasks are challenging enough to thoroughly test advanced AI models.
- Review and quality-check tasks from other engineers for accuracy and difficulty.
- Document task intent, assumptions, expected outcomes, edge cases, and scoring rationale.
Required Skills
- 4+ years of experience in DevOps, cloud infrastructure, platform engineering, or site reliability engineering.
- Extensive hands-on AWS experience across networking, IAM, compute, storage, databases, security, and cloud operations.
- Advanced expertise in Infrastructure-as-Code, particularly with Terraform.
- Proficient in Python for building automated graders and validation tools.
- Experience in testing and validating infrastructure through APIs, CLIs, or automated frameworks.
- Strong understanding of secure, reliable, and reproducible infrastructure.
- Ability to design complex technical problems with ambiguity and edge cases.
- Attention to detail in identifying unsafe or incomplete solutions.
- Excellent written communication and technical documentation skills.
- Ability to work independently in a structured, task-based environment.
Nice to Have
- Experience with Pulumi, AWS CDK, or AWS CloudFormation.
- Familiarity with boto3, pytest, Terratest, LocalStack, or similar tools.
- Background in security engineering, chaos engineering, incident response, or SRE.
- Experience with AI evaluation, benchmarking, or red teaming.
- Knowledge of AI coding agents and common AI model failure modes.
- Experience designing technical assessments or infrastructure exercises.
- Experience reviewing engineers' work and maintaining quality standards.