Applied AI Data Scientist

Location

Remote restrictions apply

See all remote locations

Salary Estimate

N/A

Seniority

N/A

Tech stacks

Data

Data Science

+25

Permanent role

7 months ago

Apply now

Applied AI Data Scientist – Document Intelligence, NLP & Analytics

Location: Fully Remote

Terms: Full Time / Direct-Hire

Salary: $120K - $170K base

Summary

We’re looking for an Applied Data Scientist who loves turning messy, real-world data into elegant, production-ready intelligence. This role sits at the intersection of Document Intelligence, Natural Language Processing (NLP), and Analytics. You’ll work on some of our most impactful systems—automating document ingestion, normalizing complex census data, and enabling natural-language Q&A over company performance and operational datasets.

If you're motivated about teaching machines how to read documents, turning PDFs, spreadsheets, and chaos into clean structured data, letting users ask business questions in plain English (and actually getting smart answers), then you’ll feel right at home here.

Responsibilities

Document Intelligence & Automation

Design and build document ingestion and intelligence pipelines (PDFs, Excel, scanned docs, structured + unstructured data)
Implement OCR, document classification, key-value extraction, and layout-aware NLP
Continuously improve accuracy through annotation, evaluation, and feedback loops

Census Data Normalization & Entity Resolution

Normalize, validate, and enrich large-scale census and enrollment datasets
Build intelligent matching, deduplication, and entity-resolution logic
Handle real-world edge cases (because real data is never clean)

NLP & AI-Powered Analytics

Develop natural-language Q&A over structured and semi-structured performance data
Build embeddings, retrieval pipelines, and semantic search
Translate business questions into analytical insights that actually matter

Production AI (Not Just Notebooks)

Move models from notebook → pipeline → production
Collaborate with Engineering, Product, and DevOps to deploy scalable solutions
Monitor model performance, drift, and data quality over time

Requirements (must have)

5+ years of experience in Data Science, Applied ML, or AI-focused roles
Proven track record of deploying real AI systems (not just experiments)
Comfortable operating with autonomy, ownership, and a bias toward action
Prefer a Bachelor’s degree in Data Science, Computer Science, Mathematics, Engineering, or a related field
Strong experience with Python for data science and ML
Hands-on experience with NLP (transformers, embeddings, semantic search, text classification)
Experience with document intelligence / OCR (e.g., Textract, Azure Form Recognizer, Google Document AI, or similar)
Solid understanding of data pipelines, ETL, and orchestration (Airflow or equivalent)
Experience working with SQL-based data stores and data warehouses
Familiarity with ML lifecycle concepts (training, evaluation, versioning, deployment)

What makes a great fit

You think in systems, not just models
You’re curious, pragmatic, and allergic to over-engineering
You enjoy collaborating with Product and Engineering—not hiding behind notebooks
You like moving fast, learning fast, and shipping meaningful work
You can explain complex ideas to non-technical teammates without sounding like a robot
You'll work on real problems with real impact—no vanity AI
Prefer an environment with high ownership, low bureaucracy, zero red tape
Will work with smart, driven teammates who care about craft and culture
Competitive compensation + startup upside
Any experience with GenAI platforms (ChatGPT, OpenAI APIs, Claude, etc.) preferred
Knowledge of MLOps tools and workflows (SageMaker, MLflow, CI/CD for ML) would be great
Experience with entity resolution, fuzzy matching, or record linkage preferred
Data visualization experience (Power BI, D3, or similar) would be great
Exposure to regulated or complex data domains (insurance, healthcare, finance) is great