Applied AI Data Scientist – Document Intelligence, NLP & Analytics
Location: Fully Remote
Terms: Full Time / Direct-Hire
Salary: $120K - $170K base
Summary
We’re looking for an Applied Data Scientist who loves turning messy, real-world data into elegant, production-ready intelligence. This role sits at the intersection of Document Intelligence, Natural Language Processing (NLP), and Analytics. You’ll work on some of our most impactful systems—automating document ingestion, normalizing complex census data, and enabling natural-language Q&A over company performance and operational datasets.
If you're motivated about teaching machines how to read documents, turning PDFs, spreadsheets, and chaos into clean structured data, letting users ask business questions in plain English (and actually getting smart answers), then you’ll feel right at home here.
Responsibilities
Document Intelligence & Automation
- Design and build document ingestion and intelligence pipelines (PDFs, Excel, scanned docs, structured + unstructured data)
- Implement OCR, document classification, key-value extraction, and layout-aware NLP
- Continuously improve accuracy through annotation, evaluation, and feedback loops
Census Data Normalization & Entity Resolution
- Normalize, validate, and enrich large-scale census and enrollment datasets
- Build intelligent matching, deduplication, and entity-resolution logic
- Handle real-world edge cases (because real data is never clean)
NLP & AI-Powered Analytics
- Develop natural-language Q&A over structured and semi-structured performance data
- Build embeddings, retrieval pipelines, and semantic search
- Translate business questions into analytical insights that actually matter
Production AI (Not Just Notebooks)
- Move models from notebook → pipeline → production
- Collaborate with Engineering, Product, and DevOps to deploy scalable solutions
- Monitor model performance, drift, and data quality over time
Requirements (must have)
- 5+ years of experience in Data Science, Applied ML, or AI-focused roles
- Proven track record of deploying real AI systems (not just experiments)
- Comfortable operating with autonomy, ownership, and a bias toward action
- Prefer a Bachelor’s degree in Data Science, Computer Science, Mathematics, Engineering, or a related field
- Strong experience with Python for data science and ML
- Hands-on experience with NLP (transformers, embeddings, semantic search, text classification)
- Experience with document intelligence / OCR (e.g., Textract, Azure Form Recognizer, Google Document AI, or similar)
- Solid understanding of data pipelines, ETL, and orchestration (Airflow or equivalent)
- Experience working with SQL-based data stores and data warehouses
- Familiarity with ML lifecycle concepts (training, evaluation, versioning, deployment)
What makes a great fit
- You think in systems, not just models
- You’re curious, pragmatic, and allergic to over-engineering
- You enjoy collaborating with Product and Engineering—not hiding behind notebooks
- You like moving fast, learning fast, and shipping meaningful work
- You can explain complex ideas to non-technical teammates without sounding like a robot
- You'll work on real problems with real impact—no vanity AI
- Prefer an environment with high ownership, low bureaucracy, zero red tape
- Will work with smart, driven teammates who care about craft and culture
- Competitive compensation + startup upside
- Any experience with GenAI platforms (ChatGPT, OpenAI APIs, Claude, etc.) preferred
- Knowledge of MLOps tools and workflows (SageMaker, MLflow, CI/CD for ML) would be great
- Experience with entity resolution, fuzzy matching, or record linkage preferred
- Data visualization experience (Power BI, D3, or similar) would be great
- Exposure to regulated or complex data domains (insurance, healthcare, finance) is great