Role Overview
We are seeking a skilled Web Scraping Engineer to develop approximately 50 Python-based scrapers for various U.S. state government websites. Your primary objective will be to extract structured data from static HTML and PDF sources, with occasional requirements for browser automation and JavaScript handling.
Responsibilities
- Design and implement efficient Python scrapers to extract data from static HTML and PDF sources.
- Develop solutions for sites requiring browser automation and JavaScript execution.
- Ensure scrapers are capable of running on a scheduled basis for continuous data extraction.
- Collaborate to refine examples, target schema, and leverage existing infrastructure.
- Participate in an initial pilot project covering a few states, with the potential to scale based on performance.
Required Skills
- Proficiency in Python with a strong understanding of web scraping libraries such as Beautiful Soup, Scrapy, or Selenium.
- Experience in handling both static HTML and PDF data extraction.
- Knowledge of browser automation tools and techniques.
- Ability to write clean, efficient, and maintainable code.
Nice to Have
- Familiarity with data extraction from government websites.
- Experience with scheduling tools and automated workflows.