For companies
  • Hire developers
  • Hire designers
  • Hire marketers
  • Hire product managers
  • Hire project managers
  • Hire assistants
  • How Arc works
  • How much can you save?
  • Case studies
  • Pricing
    • Remote dev salary explorer
    • Freelance developer rate explorer
    • Job description templates
    • Interview questions
    • Remote work FAQs
    • Team bonding playbooks
    • Employer blog
For talent
  • Overview
  • Remote jobs
  • Remote companies
    • Resume builder and guide
    • Talent career blog
HireTalent - Staffing & Recruiting Firm
HireTalent - Staffing & Recruiting Firm

Data Scientist

Location

Remote restrictions apply
See all remote locations

Salary Estimate

N/AIconOpenNewWindows

Seniority

N/A

Tech stacks

Data
Database
Data analytics
+33

Contract role
4 days ago
Apply now

Role Overview:

We are seeking a Senior Data Scientist to build and deploy LLM-based capabilities for working with large, diverse datasets and documents relevant to growth analytics & bid strategy. This role emphasizes ingestion, document processing, information extraction, and retrieval methods to support analytics use cases in production. Experience with modern LLM tooling and Databricks is required; hands-on experience with advanced reasoning models & agentic/orchestration frameworks are a plus.

Key Responsibilities:

  • Architect, build, and refine retrieval-grounded LLM systems, including basic and advanced RAG patterns, to deliver grounded, verifiable answers and insights.
  • Design robust pipelines for ingestion, transformation, and normalization of public and internal data, including ETL, incremental processing, and data quality checks.
  • Build and maintain document processing workflows across PDFs, HTML, and scanned content, including OCR, layout-aware parsing, table extraction, metadata enrichment, and document versioning.
  • Develop information extraction pipelines using LLM methods and best practices, including schema design, structured outputs, validation, error handling, and accuracy evaluation.
  • Own the retrieval stack end-to-end, including chunking strategies, embeddings, indexing, hybrid retrieval, reranking, filtering, and relevance tuning across a vector database or search platform.
  • Implement web data acquisition where needed, including scraping, change detection, source quality checks, and operational safeguards like retries and rate limiting.
  • Establish evaluation and monitoring practices for retrieval and extraction quality, including golden datasets, regression testing, groundedness checks, and production observability.
  • Collaborate with subject matter experts to translate business needs into practical retrieval and extraction workflows and measurable success criteria.
  • Communicate complex findings, tradeoffs, and recommendations to technical and business stakeholders, supporting data-driven forecasting and strategy.
  • Ensure compliance with data governance and security standards when handling sensitive data and deploying systems to production environments.

Qualifications:

  • Advanced degree in Computer Science, Data Science, Statistics, Engineering, or a related quantitative field.
  • Minimum of 4 years of experience in data science or applied ML/NLP with a focus on NLP & GenAI
  • Proficiency in Python and SQL, with strong engineering practices for maintainable, testable pipelines.
  • Strong experience with Databricks for data processing and pipeline development, including Spark and common Lakehouse patterns.
  • Demonstrated experience building retrieval-grounded LLM systems and or LLM-based information extraction for real-world use cases.
  • Experience with document ingestion and parsing, including OCR and handling messy, semi-structured content such as PDFs, tables, forms, and web pages.
  • Familiarity with vector databases and retrieval concepts, including indexing, embeddings, hybrid retrieval, reranking, and performance and cost tuning.
  • Strong understanding of best practices for reasoning models and techniques that improve reliability and reduce hallucinations, including grounding and attribution.
  • Excellent communication skills, with a track record of partnering with stakeholders and turning ambiguous requests into adopted solutions.

Libraries and Tools:

  • Proficiency with LLM and orchestration libraries such as OpenAI, Google GenAI, Lang graph, langchain.
  • Experience with supporting tooling commonly used in production LLM systems, for example: Pydantic for schema validation, tenacity for retries, beautifulsoup4 for HTML data extraction, and standard Python data tooling such as pandas and NumPy.
  • Experience with retrieval and vector tooling, such as FAISS, Elasticsearch or OpenSearch, and vector database platforms (for example, Pinecone, Weaviate, Milvus, Chroma).

Preferred Qualifications:

  • Exposure to agentic patterns and tool-calling for workflow automation.
  • Experience working in regulated environments and implementing governance controls such as access control, auditability, and retention.

About HireTalent - Staffing & Recruiting Firm

🔗Website
Visit company profileIconOpenNewWindows

Unlock all Arc benefits!

  • Browse remote jobs in one place
  • Land interviews more quickly
  • Get hands-on recruiter support
PRODUCTS
Arc

The remote career platform for talent

Codementor

Find a mentor to help you in real time

LINKS
About usPricingArc Careers - Hiring Now!Remote Junior JobsRemote jobsCareer Success StoriesTalent Career BlogArc Newsletter
JOBS BY EXPERTISE
Remote Front End Developer JobsRemote Back End Developer JobsRemote Full Stack Developer JobsRemote Mobile Developer JobsRemote Data Scientist JobsRemote Game Developer JobsRemote Data Engineer JobsRemote Programming JobsRemote Design JobsRemote Marketing JobsRemote Product Manager JobsRemote Project Manager JobsRemote Administrative Support Jobs
JOBS BY TECH STACKS
Remote AWS Developer JobsRemote Java Developer JobsRemote Javascript Developer JobsRemote Python Developer JobsRemote React Developer JobsRemote Shopify Developer JobsRemote SQL Developer JobsRemote Unity Developer JobsRemote Wordpress Developer JobsRemote Web Development JobsRemote Motion Graphic JobsRemote SEO JobsRemote AI Jobs
© Copyright 2025 Arc
Cookie PolicyPrivacy PolicyTerms of Service