Machine Learning Engineer (Generative Video and Visual Models) - FT Freelance - Worldwide

Location

Remote anywhere

Hourly rate

Min. experience

3 - 5 years

Hours per week

40 hours

Duration

12 weeks

Required skills

Freelance job

Posted a day ago

Apply now

Actively recruiting / 3 applicants

We’re here to help you

Wilson Bittencourt is in direct contact with the company and can answer any questions you may have. Email

Wilson Bittencourt, Recruiter

Role Overview

We are seeking a skilled Machine Learning Engineer to design and scale the infrastructure behind our generative video and visual models for our innovative product, verv.fm. This role blends deep expertise in ML engineering with practical experience in state-of-the-art generative systems such as Stable Diffusion, Comfy, and Flux. Your work will be pivotal in building robust data pipelines and enhancing the quality and control of our media generation stack.

Responsibilities

Design and implement comprehensive data workflows, including ingestion, cleaning, validation, filtering, and quality scoring.
Fine-tune and deploy generative models like Stable Diffusion and Flux Loras, and construct large-scale workflows utilizing tools such as Comfy.
Train and work with video models like WAN and VACE.
Integrate advanced computer vision techniques, including segmentation, mask operations, and object tracking, into our generation pipelines.
Develop high-throughput pipelines for frame extraction, captioning, and extensive visual data processing.
Experiment with and develop context- and prompt-aware model orchestration strategies.
Contribute to the observability and monitoring throughout the ML data lifecycle.
Collaborate with infrastructure teams to scale operations across GPU-backed and serverless environments, such as Fal.ai.
Operate within a dynamic, fast-paced product environment, incorporating both creative and technical inputs.
(Bonus) Investigate self-supervising or agentic workflows to automate pipeline feedback and enhancement.

Required Skills

Extensive experience in ML engineering, particularly in computer vision, generative media, or multimodal systems.
Proven hands-on experience with fine-tuning or deploying generative models like SD, Flux, and Comfy.
Proficiency in Python and asynchronous API development.
Familiarity with image/video-specific challenges, including frame alignment, codec handling, and perceptual quality scoring.
Experience with scalable data systems such as Airflow, Spark, or Ray.
Solid understanding of GPU infrastructure and best practices for model deployment.
Knowledge of prompt engineering and context-driven model behavior.
Ability to work effectively in ambiguous situations and bridge the gap between infrastructure and modeling challenges.