Verified Job On Employer Career Site
Job Summary:
Tekmetric is a cloud-based platform designed to help auto repair shops operate more efficiently and grow sustainably. They are seeking a Software Engineer with expertise in web scraping, data processing, and search technologies to build a large-scale data ingestion and classification system, working primarily with Python and various data technologies.
Responsibilities:
• Build and design large scale, distributed crawling bots (perhaps AI agents) and infrastructure that operate in an adversarial environment aiming at low operational overhead
• Develop and maintain data pipelines to extract data from large volumes of web pages, documents, PDFs (OCR), and APIs.
• Help unify heterogeneous documents into a coherent data schema across varied source formats
• Preprocess and normalize raw data for downstream classification, ML/NLP, and search indexing.
• Build APIs to expose structured, classified data via ElasticSearch/OpenSearch.
• Collaborate with ML/NLP teams to integrate classification models into the pipeline.
• Automate workflows using Apache Airflow and deploy solutions in Kubernetes on AWS.
• Optimize and scale data pipelines using Spark (EMR) for processing large datasets.
Qualifications:
Required:
• 3+ years of experience in Python with building crawling/scraping solutions at scale.
• Experience working with APIs (REST), PDF processing (OCR, Tesseract, PyMuPDF etc.).
• Proficiency in data processing & search technologies (ElasticSearch/OpenSearch, NoSQL/SQL databases).
• Hands-on experience with Airflow and Spark (EMR) or similar distributed systems.
• Strong problem-solving skills in handling anti-scraping mechanisms and data scaling challenges.
• Hands-on experience with AWS or GCP.
Preferred:
• Familiarity with NLP and Machine Learning (a plus but not required).
• Experience with LLMs, NLP models, or ML frameworks (e.g., Hugging Face, spaCy, TensorFlow, PyTorch).
• Prior experience in automated document classification.
• Experience working in high-scale, production environments with petabytes of data.
• Hands-on experience with Kubernetes.
Company:
Digital vehicle inspections, repair orders, inventory, job progress, customer communication, & much more. Founded in 2015, the company is headquartered in Houston, Texas, USA, with a team of 51-200 employees. The company is currently Growth Stage.