About Forage AI: Forage AI delivers large-scale data collection and processing platforms, including web crawlers, document parsers, data pipelines, AI/Agentic Solutions, and AI-assisted workflows.
Our primary programming language is Python. We design both cloud-native and cloud-agnostic solutions, primarily on AWS, while also working with GCP and Azure. We value high ownership, strong collaboration, and pragmatic, well-documented engineering.
Role Overview: We are looking for a hands-on Engineering Lead who combines deep Python expertise with
proven experience building high-scale web scraping and data collection systems. You will own
the full lifecycle of our data acquisition platform — from architecting resilient crawlers and
parsers to leading a team of engineers and ensuring production reliability. Cloud/DevOps
experience is valued but not the primary focus; your core strength is in Python craftsmanship
and data extraction at scale.
Key Responsibilities
Engineering Leadership
1. Own end-to-end delivery across requirements, design, implementation, testing, and
production operations.
2. Lead and grow a team of engineers through mentorship, code reviews, pairing on hard
problems, and raising overall engineering quality.
3. Translate business and client needs into clear technical plans; manage risks, trade-offs,
and timelines.
4. Establish and enforce engineering best practices: branching strategy, code standards,
testing discipline, and incident/RCA processes.
Python & Scraping Excellence
5. Architect and build scalable, fault-tolerant crawling and parsing systems (Scrapy,
Selenium, Playwright, or equivalent frameworks).
6. Write production-grade Python and set the bar via reference implementations, design
docs, and thorough code reviews.
7. Build robust mechanisms for anti-bot evasion, proxy rotation, rate limiting, and content
fingerprinting.
8. Design parsers and extractors for structured and unstructured content across diverse
and changing web targets.
Data Pipeline & Processing
9. Design and operate ETL/ELT pipelines for large-scale data enrichment, transformation,
and delivery.
10. Ensure data quality, consistency, and observability across the collection and processing
stack.
11. Drive pipeline reliability: monitoring, alerting, graceful failure handling, and recovery
strategies.
Infrastructure & Operations (Supporting Role)
12. Work with cloud infrastructure (primarily AWS) for deployment, scheduling, and storage
— own enough to be self-sufficient.
13. Collaborate on CI/CD, containerisation (Docker/Kubernetes), and environment
management.
14. Contribute to observability: structured logging, metrics, and alerting for scraping and
pipeline workloads.
Required Qualifications
• 6–8 years of software engineering experience, with meaningful team or project
leadership in a production environment.
• Expert-level Python: strong command of async/concurrency patterns, memory
management, packaging, and writing idiomatic, maintainable code.
• Hands-on, production experience with web scraping frameworks — Scrapy and/or
Selenium (or Playwright) — at meaningful scale.
• Deep understanding of web protocols (HTTP/S, cookies, sessions, headers, JavaScript
rendering) and browser automation.
• Experience designing and operating resilient crawlers: retry logic, deduplication,
scheduling, and distributed crawl management.
• Solid SQL and NoSQL experience: schema design, query optimisation, and working with
unstructured/semi-structured data.
• Proven ability to lead engineers: code reviews, technical mentorship, driving delivery
discipline and engineering quality.
• Strong DS/Algo fundamentals; ability to reason clearly about system design,
performance, and correctness.
• Excellent communication: clear design documents, strong stakeholder communication,
and actionable technical feedback.
Preferred / Good to Have
Data Pipelines (High Priority)
• Experience building or operating data pipeline tooling — Airflow, Prefect, Luigi, or similar
orchestrators.
• Familiarity with streaming systems (Kafka, Kinesis) for real-time data ingestion
workflows.
• Experience with Spark or distributed compute frameworks for large-scale data
transformation.
Cloud & DevOps
• Working knowledge of AWS services relevant to data workloads: S3, Lambda, ECS,
SQS, RDS/DynamoDB.
• Familiarity with CI/CD (GitHub Actions, Jenkins) and containerised deployments (Docker,
Kubernetes).
• Basic infrastructure-as-code experience (Terraform or CloudFormation) is a plus, not a
requirement.
Additional Nice-to-Haves
• Familiarity with vector databases or semantic search systems.
Security awareness: OWASP basics, secrets management, and least-privilege patterns.
• Experience with NLP/ML tooling for content extraction, classification, or enrichment.
• Hiring, interviewing, and onboarding experience; talent development mindset.
• Frontend/JS exposure (nice-to-have only; useful for understanding scraping targets).
What Sets You Apart
You are a craftsperson who takes pride in clean, performant Python. You understand the web
deeply, not just how to scrape it, but why it behaves the way it does. You've operated systems
that collect data at scale and you know how to make them reliable. You lead by example,
elevate the people around you, and communicate clearly with both engineers and non-technical
stakeholders.
Work‑from‑Home Requirements: