Actively recruiting / 16 applicants
We’re here to help you
Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email
Juliana Torrisi, RecruiterRole Overview
Own the design, build, and operation of scalable SERP scraping systems—with a focus on Google Web, Google Images, and Google Video—delivering high-volume, reliable, and observable data pipelines.
Top Requirements (Must-Have)
- 10+ years developing SERP scrapers (Google Web/Images/Video), from architecture to production ops.
- Deep understanding of web protocols and TLS certificates; fluent in reverse-engineering techniques to bypass anti-bot mechanisms (headless detection, fingerprinting, client hints, CAPTCHA pathways).
- Proven ability to scale to hundreds of thousands / millions of requests per day, including proxy rotation and proxy pool management (residential/DC/mobile, sticky sessions, health scoring).
- Strong English communication—able to collaborate closely with engineering, write clear docs/runbooks, and present trade-offs.
Responsibilities
- Architect, implement, and run high-throughput SERP pipelines with robust retry/backoff, idempotency, and backpressure.
- Design anti-bot strategies (TLS/JA3/JARM tuning, identity/fingerprint variance, session management, smart wait heuristics).
- Build and tune browser automation (Playwright/Puppeteer/Selenium + Chrome DevTools Protocol) and HTTP clients for mixed headless/headful needs.
- Enforce data quality (schemas, dedupe, normalization); instrument observability (metrics, traces, alerts) and optimize proxy/compute costs.
- Produce clear runbooks, SLIs/SLOs, and partner with platform/data teams on storage and downstream processing.
Tech Environment (Indicative)
- Languages: Python, Node.js/TypeScript (Go a plus)
- Automation: Playwright / Puppeteer / Selenium + CDP
- Infra: Containers, queues/workers, distributed schedulers
- Observability: Prometheus/Grafana, ELK/OpenTelemetry
- Proxies: Residential/DC/mobile with rotation, health, and ASN/region mix
Nice to Have
- Experience mitigating Cloudflare/PerimeterX/Akamai.
- Data pipelines (Kafka/PubSub, Airflow/Argo, Spark/Flink).
- Cost modeling and provider diversification; i18n scraping.