At DevSavant, we are a trusted technology partner specializing in Software Development, Data Engineering, AI/Machine Learning, Cloud Solutions, Automation Testing, and UI/UX Design. We deliver innovative, high-quality solutions with a focus on excellence and results. Our people are at the heart of everything we do, fostering a culture of growth and well-being. Join us and thrive in a supportive, success-driven environment.
Our client is a data and analytics startup that aims to expand knowledge of subscriber behavior so brands can entertain, inspire, and empower the world. They provide industry and competitive benchmarks across key metrics so our customers can know ‘what good looks like’, market intelligence so they can learn how to build a best-in-class business, and insights to inform their strategic decision-making. They are quickly becoming the market standard for subscription analytics. They have defined product/market fit, secured household names as clients, and are in ‘hyper-growth’ mode!
Job Summary
We're looking for a talented Data Scientist with expert Python skills and experience in processing large amounts of data to join the team. You'll be a key player in designing, building, and making their main data pipelines and ML systems (that power our advanced analytics and machine learning models) able to handle more. You'll work closely with data scientists and engineers to create strong, efficient, and scalable systems. If you love solving complex technical problems, building production-ready data systems, and want to make a big impact on a data-driven company, this job is for you!
You will report to the Senior Manager, Data Science. Our client is a remote-first company, and we are looking for candidates who can work during US business hours.
What You’ll Do
- Design, develop, test, and maintain strong and scalable data pipelines using Python and tools for large-scale data processing (like Spark, Dask, or similar on GCP).
- Design and take ownership of key parts of our ML systems, making sure they are reliable, efficient, and can grow.
- Set up and manage MLOps practices, including automatic updates for machine learning models (CI/CD), model monitoring, and automated launch plans.
- Improve and manage data processing jobs on cloud platforms (GCP: Dataproc, BigQuery, Cloud Run, Cloud Build).
- Work with data scientists to get machine learning models ready for production and connect them to our data systems.
- Write detailed documents for the system designs, code, and systems you create and manage.
- Fix complex technical problems in data systems that run on many computers and in ML pipelines.
Who You Are
- You have 3-5+ years of work experience in software engineering, with a strong focus on data engineering, ML engineering, or building applications that use a lot of data.
- You are an expert in Python, with a strong understanding of object-oriented design, software system design, and experience building high-quality, testable code for production.
- You have strong, hands-on experience with tools for handling large amounts of data like Apache Spark (PySpark), Dask, or similar.
- You have solid experience with cloud platforms (GCP is highly preferred). This includes putting services live, managing them, making them handle more users (e.g., Docker, Cloud Run, GKE), and working with large data systems (e.g., Dataproc, BigQuery).
- You have strong SQL skills and experience working with large, complex datasets.
- You have a deep understanding of machine learning ideas, the full process of creating a model, and MLOps principles.
- You are an excellent problem-solver, good at fixing complex issues in systems that run on many computers, and making them perform better and handle more data.
- You explain complex technical ideas and system design decisions clearly and effectively in English.
- Advanced English proficiency (B2-C1); Excellent communication, teamwork, and consulting skills.
- You are passionate about building strong, scalable systems and are eager to guide and work with a team.
- You care deeply about code quality, system reliability, and writing good documentation.
Bonus
- Experience in or passion for the Subscription Economy, especially in media and entertainment.
- Deep knowledge of specific GCP services like Dataproc, Dataflow, Cloud Composer, Vertex AI, or Kubernetes Engine.
- Experience building and maintaining Python code (libraries) used by many, or contributions to open-source projects.
- Advanced knowledge of MLOps tools and ways to manage workflows (e.g. Cloudbuild, CloudRun).
Tech Stack (Required Proficiency)
- Languages: Python (expert), SQL (strong)
- Large-Scale Data Processing: Apache Spark/PySpark (or similar like Dask)
- Cloud Platform: Google Cloud (Dataproc, BigQuery, Cloud Storage, Cloud Run, Cloud Build, GKE - strong experience expected)
- Version Control: Git (expert)
- MLOps & Orchestration: Familiar with tools like Airflow, Kubeflow, Vertex AI Pipelines
- Containerization: Docker, Kubernetes
- Data Analysis Libraries: Pandas, NumPy (very good with these)
- Machine Learning: scikit-learn, TensorFlow/PyTorch (understand how to get them to production)
- AI Tools: Claude, Gemini, OpenAI offerings