Data Scientist (Python / LLM / MLOps) - Full Time - US Only

Location

Remote restrictions apply

See all remote locations

Salary

US$80K - 160K

Min. experience

5+ years

Required skills

Full-time role

Posted 25 days ago

Apply now

Actively recruiting / 90 applicants

We’re here to help you

Wilson Bittencourt is in direct contact with the company and can answer any questions you may have. Email

Wilson Bittencourt, Recruiter

The Mission

Turn messy public-record & web data into trustworthy mass-tort signals - via rapid EDA, rigorous modeling, and production-grade APIs - so lawyers act on facts, not hunches.

What You’ll Own

EDA Autopilot - Grab raw JSON/CSV/HTML, profile it, spot outliers, and surface “aha” patterns - without waiting for a PM to ask.
Model Builder - Train and tune classification / ranking models (Typical classifiers, light ML, LLM-based RAG) that lift recall & precision week-over-week.
API Integrator - Package models behind FastAPI endpoints, validate / marshal schemas with Pydantic, and push to GitHub Actions CI in a day.
MLOps Wrangler - Monitor drift, schedule batch recall and write lightweight tests.
Insight Storyteller - Ship clear notebooks / dashboards & concise Loom walk-throughs that legal SMEs grok in minutes.
Startup Swiss-Army Knife - Spot gaps (data gaps, labeling gaps, infra gaps) and patch them before anyone asks. Ambiguity is the default.

What Success Looks Like

Week 4 - First exploratory notebook flags a recall-worthy defect the founders didn’t see.
Week 6 - A FastAPI route serving the trained model hits < 300 ms P95 latency in prod.
Quarter 1 - Recall ↑ 15 pp and false-positive rate ↓ 10 pp on live web-scrape feed; zero pager-alerts.

Your Toolkit

2-4 yrs Python data science: pandas/Polars, Astral stack, PyTorch/TF.
Comfort spinning up FastAPI + Pydantic micro-services.
Familiarity with using rate-limited LLMs to augment and clean existing datasets
Solid SQL & object-storage chops (Postgres, DuckDB, S3).
CI/CD familiarity (GitHub Actions or similar); basic IaC a plus.
You document & demo your work proactively - no babysitting required.

Nice-to-Haves

Prior scraping work (Scrapy, Playwright) or PACER/NHTSA/FDA datasets.
Experience with vector DBs (Qdrant, pgvector) & prompt-engineering.
Exposure to SOC 2 or other regulated-data environments.

Interview Process

15 minute initial overview call
~2-3 hour take home assessment with dataset provided. EDA and written communication expected
1 hour pair programming assessment with CTO, with screenshare and agent use
30 minute Q&A with founding team

Why Join

Green-field ML canvas, instant customer feedback, and exposure to leadership in a venture-backed startup already generating revenue and real impact in legal tech. Shape how mass-tort intelligence is built in the AI era.

Ready to build? Apply today.