Seeking a Senior Machine Learning Engineer with 5-10 years of industry experience who thrives at the intersection of scalable ML systems, model observability, and business impact. In this role, you will develop and deploy cutting-edge machine learning models, including Large Language Models (LLMs) and foundational models, to solve business challenges. This position offers the opportunity to work closely with cross- functional teams, including product managers and software engineers, to build scalable AI solutions. You won’t be reinventing broken pipelines—you’ll be building on a strong foundation designed for scale and maintainability. You’ll own the lifecycle of ML systems, from training and batch inference to observability and work closely with Directors of Engineering and Product to create pipelines that directly impact product strategy and business outcomes.

We seek someone with a strong technical background in Python and PySpark, Pytorch,

Tensorflow and SQL, who can leverage cloud computing environments (such as AWS, Databricks, and Snowflake) to deploy and optimize models in production. Knowledge of the advertising industry is preferred.

Key Responsibilities

Architect, train, and maintain scalable ML systems for batch inference, including LLM and foundational model pipelines.
Build resilient training and inference workflows using Airflow and Databricks, focusing on automation and reproducibility.
Implement robust model monitoring solutions (e.g., Prometheus, Grafana, WhyLabs) to track drift, performance, and quality across the ML lifecycle.
Work closely with engineering teams to integrate model outputs into production systems, optimize dataflows, and ensure fault-tolerance.
Partner with product stakeholders to align ML efforts with business impact, KPIs, and product strategy.
Lead technical design reviews, contribute to internal libraries, and enforce engineering best practices (e.g., testing, versioning, modularity).
Stay current on the LLM ecosystem (LoRA, quantization, vector stores, inference optimization) and help guide adoption internally.
Perform experiments to validate new approaches and monitor model performance in production environments.
Keep up-to-date with the latest advancements in machine learning, LLMs, and foundational models to continuously improve solutions.

Qualifications

Master’s Degree or PhD in Statistics, Computer Science, or Related Discipline with 5-10 years of industry experience.
Familiarity with LLMs, vector databases, Hugging Face, and tools like Ray or Triton Inference Server.
Strong grasp of MLOps best practices: versioning (MLflow), monitoring, testing, and reproducibility.
Experience working with Airflow for orchestration and cloud platforms like AWS (EC2, S3, SageMaker a plus).
Exposure to adtech or performance marketing is a strong plus.
Experience building and maintaining feature stores or large-scale ETL/ELT systems.
Skilled at writing production-grade code and designing modular, maintainable systems.
Experience with GraphQL, REST, HTTP, or gRPC basics and the ability to design and implement maintainable APIs.
Strong problem-solving skills with a passion for continuous learning and staying ahead of new technologies.
Excellent collaboration and interpersonal skills; ability to work effectively in a team oriented, cross-functional environment.

Location:

The company is based in NYC, with a hybrid office/remote work site. This role is remote with the option for Hybrid work in NYC.

Job-2735082