For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
Umbrex
Umbrex

Reliability Data Scientist

Location

Remote restrictions apply
See all remote locations

Salary Estimate

N/AIconOpenNewWindows

Seniority

N/A

Tech stacks

Python
SQL
Data
+11

Visa

U.S. visa required

Contract role
a day ago
Apply now

Our client is looking for a Reliability Data Scientist to design evaluation scenarios, datasets, and metrics that reveal real risks in production AI. This role sits at the intersection of data science, evaluation design, and AI monitoring—supporting reliability dashboards, weekly reports, and client triage workflows.

Key Responsibilities:

  • Design evaluation scenarios and metric frameworks to assess AI quality, suitability, reliability, and context-dependent behavior.
  • Build and maintain evaluation assets including datasets, golden traces, error taxonomies, and automated scoring/aggregation pipelines in partnership with engineering.
  • Develop and manage weekly reliability dashboards and automated reports, translating monitoring data into clear insights.
  • Analyze evaluation results to detect drift, outliers, context-driven failures, and calibration issues—validating evaluator reliability against human judgments.
  • Document test logic, metric definitions, and interpretation guidance, and support context-engineering workflows with metrics for predictability, observability, and directability.

Ideal candidate has 3–6 years of experience and brings:

  • Strong Python + SQL + data-wrangling skills
  • Hands-on experience with evaluation design, sampling, and calibration
  • Comfort with dashboards (Grafana, PowerBI, or similar)
  • Experience building golden datasets and structured evaluation traces
  • Exposure to LLM or AI system evaluation (preferred)
  • Experience in regulated industries (audit, finance, healthcare) is a plus
  • Excellent communication — ability to turn technical data into decision-ready insights

Start Date: December 2025

Duration: 4-6 months

Time Commitment: ~20 hours/week

Location: Remote, in the U.S.

Expected rate: US$100-$120 per hour

Project ID#: 8021

**This is a contract role and does not offer health benefits.

About Umbrex

👥11-50
📍Los Angeles
🔗Website
Visit company profileIconOpenNewWindows

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2025 Arc
Cookie PolicyPrivacy PolicyTerms of Service