Senior Software Engineer — AI Evaluation & Benchmarks

Location

Remote restrictions apply

See all remote locations

Salary Estimate

N/A

Seniority

Senior

Tech stacks

Software Development

Data

+27

Contract role

5 hours ago

Apply now

About The Role

What if the code you write could determine how smart the next generation of AI truly is? We're looking for experienced Software Engineers to design and build the coding benchmarks and data pipelines used to evaluate frontier AI models — the systems that decide whether an AI can actually reason, debug, and write production-quality software.

This is high-impact, technically demanding work at the intersection of software engineering and AI research. You'll work with large codebases, multiple programming languages, and scalable infrastructure to create evaluation systems that push the boundaries of what AI can do.

This is a fully remote contract role. If you thrive in fast-paced engineering environments and want your work to directly shape the trajectory of AI — this is the role.

Organization: Alignerr
Type: Hourly Contract
Location: Remote
Contract Length: 3 Months
Commitment: Full-time availability preferred

What You'll Do

Design and implement coding benchmarks used to evaluate frontier AI models across real-world programming tasks
Build and maintain scalable data pipelines for AI evaluation workflows
Analyze AI-generated code for correctness, reliability, and edge-case failures
Create structured evaluation scenarios that rigorously test reasoning, debugging, and code quality
Work with large code repositories and multi-language environments
Collaborate on systems that improve how AI models understand and generate software
Provide detailed technical feedback on model performance and failure patterns
Contribute to the design of evaluation frameworks that set industry standards

Who You Are

4+ years of professional software engineering experience — this is non-negotiable
Experience working at a high-growth tech company or top-tier software organization
Expert proficiency in Python — you write clean, performant, well-tested Python code
Hands-on experience with code repositories and working in large, complex codebases
Proven experience designing and implementing LLM coding benchmarks and data pipelines
Track record of working in high-performance engineering environments with large-scale products or platforms
Strong command of version control systems (Git) and modern development workflows
Bilingual or native English speaker with strong written communication skills
Self-directed, technically rigorous, and comfortable operating with autonomy

What Makes a Perfect Match

Candidates with these additional qualifications have the highest chance of success:

Senior or Lead-level engineering profiles with a history of technical ownership
Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field — or equivalent professional experience
Proficiency in one or more additional languages: JavaScript, Go, C++, or other relevant languages
Experience with CI/CD pipelines and writing robust unit tests (pytest, Mocha, JUnit)
Background in security engineering or significant open-source contributions
Familiarity with AI/ML evaluation methodologies or model benchmarking

Why Join Us

Work on cutting-edge AI evaluation projects alongside world-class research teams
Fully remote — work from anywhere with a reliable internet connection
Your benchmarks directly influence how the most advanced AI systems in the world are measured and improved
Freelance autonomy with meaningful, high-stakes engineering work
Collaborate with a global community of elite engineers and researchers
Potential for contract extension and ongoing engagement as new evaluation challenges emerge

About Alignerr

🔗Website

Visit company profile

Unlock all Arc benefits!

Browse remote jobs in one place
Land interviews more quickly
Get hands-on recruiter support

PRODUCTS

Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS

About us Pricing Arc Careers - Hiring Now!Remote Junior Jobs Remote jobs Career Success Stories Talent Career Blog Arc Newsletter

JOBS BY EXPERTISE

Remote Front End Developer Jobs Remote Back End Developer Jobs Remote Full Stack Developer Jobs Remote Mobile Developer Jobs Remote Data Scientist Jobs Remote Game Developer Jobs Remote Data Engineer Jobs Remote Programming Jobs Remote Design Jobs Remote Marketing Jobs Remote Product Manager Jobs Remote Project Manager Jobs Remote Administrative Support Jobs

JOBS BY TECH STACKS

Remote AWS Developer Jobs Remote Java Developer Jobs Remote Javascript Developer Jobs Remote Python Developer Jobs Remote React Developer Jobs Remote Shopify Developer Jobs Remote SQL Developer Jobs Remote Unity Developer Jobs Remote Wordpress Developer Jobs Remote Web Development Jobs Remote Motion Graphic Jobs Remote SEO Jobs Remote AI Jobs

Cookie Policy Privacy Policy Terms of Service