For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
Alignerr
Alignerr

Senior Software Engineer — AI Evaluation & Benchmarks

Location

Remote restrictions apply
See all remote locations

Salary Estimate

N/AIconOpenNewWindows

Seniority

Senior

Tech stacks

Software Development
AI
Data
+26

Contract role
a day ago
Apply now

About The Role

What if the code you write could determine how smart the next generation of AI truly is? We're looking for experienced Software Engineers to design and build the coding benchmarks and data pipelines used to evaluate frontier AI models — the systems that decide whether an AI can actually reason, debug, and write production-quality software.

This is high-impact, technically demanding work at the intersection of software engineering and AI research. You'll work with large codebases, multiple programming languages, and scalable infrastructure to create evaluation systems that push the boundaries of what AI can do.

This is a fully remote contract role. If you thrive in fast-paced engineering environments and want your work to directly shape the trajectory of AI — this is the role.

  • Organization: Alignerr
  • Type: Hourly Contract
  • Location: Remote
  • Contract Length: 3 Months
  • Commitment: Full-time availability preferred

What You'll Do

  • Design and implement coding benchmarks used to evaluate frontier AI models across real-world programming tasks
  • Build and maintain scalable data pipelines for AI evaluation workflows
  • Analyze AI-generated code for correctness, reliability, and edge-case failures
  • Create structured evaluation scenarios that rigorously test reasoning, debugging, and code quality
  • Work with large code repositories and multi-language environments
  • Collaborate on systems that improve how AI models understand and generate software
  • Provide detailed technical feedback on model performance and failure patterns
  • Contribute to the design of evaluation frameworks that set industry standards

Who You Are

  • 4+ years of professional software engineering experience — this is non-negotiable
  • Experience working at a high-growth tech company or top-tier software organization
  • Expert proficiency in Python — you write clean, performant, well-tested Python code
  • Hands-on experience with code repositories and working in large, complex codebases
  • Proven experience designing and implementing LLM coding benchmarks and data pipelines
  • Track record of working in high-performance engineering environments with large-scale products or platforms
  • Strong command of version control systems (Git) and modern development workflows
  • Bilingual or native English speaker with strong written communication skills
  • Self-directed, technically rigorous, and comfortable operating with autonomy

What Makes a Perfect Match

Candidates with these additional qualifications have the highest chance of success:

  • Senior or Lead-level engineering profiles with a history of technical ownership
  • Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field — or equivalent professional experience
  • Proficiency in one or more additional languages: JavaScript, Go, C++, or other relevant languages
  • Experience with CI/CD pipelines and writing robust unit tests (pytest, Mocha, JUnit)
  • Background in security engineering or significant open-source contributions
  • Familiarity with AI/ML evaluation methodologies or model benchmarking

Why Join Us

  • Work on cutting-edge AI evaluation projects alongside world-class research teams
  • Fully remote — work from anywhere with a reliable internet connection
  • Your benchmarks directly influence how the most advanced AI systems in the world are measured and improved
  • Freelance autonomy with meaningful, high-stakes engineering work
  • Collaborate with a global community of elite engineers and researchers
  • Potential for contract extension and ongoing engagement as new evaluation challenges emerge

About Alignerr

🔗Website
Visit company profileIconOpenNewWindows

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2026 Arc
Cookie PolicyPrivacy PolicyTerms of Service