Role Summary: Join our dynamic and innovative team dedicated to delivering cutting-edge solutions in web data extraction, AI integration, and software engineering. As a Software Engineer, you will collaborate with a talented group of engineers and data scientists to develop robust, scalable solutions while acting as a crucial link between multiple teams to drive project success.
Key Responsibilities:
- Act as a liaison between internal teams, QA teams, and external stakeholders to ensure seamless collaboration.
- Provide proactive communication to ensure alignment throughout the development cycle.
- Design and implement robust spiders and crawlers for large-scale web data extraction.
- Handle complex data structures and ensure optimal performance of web scraping solutions.
- Write clean, maintainable, and efficient Python code following industry best practices.
- Leverage expertise in databases, including NoSQL and RDBMS systems.
- Manage sprint boards, tasks, and timelines to ensure on-time delivery.
- Facilitate Sprint Planning and Retrospectives and perform Peer Reviews.
- Help in researching about different solutions with Senior Developers on the team and help to perform unit tests and regression tests. Also, work together internally with QA/Research teams and developers to create Automated Data/pipeline checks or validations
- Get trained on the usage of products and tools developed by the team and help create and maintain documentation for them.
- Collaborate with AI experts to integrate local AI models into project pipelines.
- Work with cloud platforms such as AWS or Azure for scalable deployment and infrastructure management.
- Utilize Docker for containerization and ensure smooth system integration.
- Build and optimize generic robots to extract data from various web and document formats.
- Craft scalable solutions to enhance web crawling strategies.
- Identify patterns in semi-structured data and build code to address exceptions.
- Manage live execution of robots, handling turnaround times, exceptions, QA, and delivery.
- Automate end-to-end project pipelines using Python.
- Apply project management best practices to ensure efficient task prioritization and resource management.
Required Qualifications:
- Bachelor’s degree in Computer Science/Information Technology engineering preferred.
- 3-5 years of experience in web crawling using Python.
- Hands-on experience with Python libraries such as Requests, Scrapy, Pandas, Urllib, or BeautifulSoup (BS4).
- Experience with API development is a plus.
- Proficiency with RDBMS systems (PostgreSQL, SQLServer, etc.).
- Knowledge and exposure to AWS, Docker, and Lambda.
- Experience creating and managing fully automated project pipelines using Python.
- Familiarity with web-based automation tools (Selenium, Puppeteer, Mechanize, Render) is an advantage.
- Strong communication (Listening Skills should be fantastic) and collaboration skills with a proactive approach to problem-solving.
- Understanding of Generative AI concepts and AI agent ecosystems.
- Strong organizational skills for managing sprints and tasks.
- Project management experience, including task delegation and sprint execution.
- Experience implementing AI models within data pipelines is preferred.
- Certification in cloud platforms such as AWS or Azure is a plus.
- Knowledge of QA processes and frameworks is an advantage.
Other Infrastructure Requirements:
- High-speed internet connectivity for video calls and efficient work.
- Capable business-grade computer (modern processor, 8 GB+ RAM, and no obstacles to efficient work).
- Headphones with clear audio quality.
- Stable power connection and backups in case of internet/power failure.