This is a remote position.
Position Summary
We're seeking a Data Engineer to architect and scale our data infrastructure as we handle increasingly complex and sensitive datasets. You'll lead the development of automated data pipelines that process everything from clinical notes to diagnostic imaging, ensuring compliance, performance, and reliability at scale.
This is a high-impact role where you'll shape our technical direction while working with cutting-edge data types in healthcare and media.
Key Responsibilities
Infrastructure Leadership:
- Design and implement scalable, automated data pipelines with robust validation, transformation, and compliance controls
- Optimize storage and retrieval systems to reduce costs while improving performance across the platform
- Lead architectural decisions for handling petabyte-scale, multi-modal datasets
Complex Data Processing:
- Extract, process, and transform unique data types including clinical notes, video metadata, medical imaging (DICOM), and unstructured documents
- Build systems that handle real-time and batch processing of sensitive healthcare and media data
- Implement data quality frameworks and monitoring systems
Cross-Functional Impact:
- Collaborate with AI/ML teams to optimize data pipelines for model training workflows
- Partner with product and engineering teams to deliver scalable solutions
- Mentor junior engineers and contribute to technical strategy discussions
Compliance & Security:
- Ensure HIPAA compliance and implement privacy-preserving data processing techniques
- Build audit trails and governance frameworks for sensitive data handling
- Design systems that meet enterprise security and compliance requirements
Required Qualifications
Required Experience:
- 5+ years in data engineering with proven experience scaling data systems in production
- Startup experience preferred - comfortable with ambiguity and rapid iteration
- Technical expertise: Python and/or Java in production environments, modern data infrastructure (Spark, Kafka, Airflow), cloud platforms (AWS/GCP/Azure)
- Independent execution: Ability to take vague requirements and deliver structured, scalable solutions
Preferred Qualifications:
- Experience with healthcare data compliance (HIPAA, GDPR) or media data pipelines
- Background with unstructured data processing (PDFs, images, video)
- Data privacy and security experience in regulated industries
- Previous work with ML/AI data pipelines and model training workflows
Compensation & Benefits
Compensation & Equity:
- Competitive salary commensurate with experience
- Meaningful equity stake in a fast-growing company
- Comprehensive benefits package
Growth & Impact:
- Front-row seat to solving one of AI's most strategic problems
- Autonomy to lead high-impact projects with significant technical scope
- Direct influence on product direction and company growth
Work Environment:
- Remote-first culture with flexible working arrangements
- Small, highly effective team with diverse backgrounds
- Access to cutting-edge technology and substantial infrastructure budget
- Collaborative environment with minimal bureaucracy
Ready to Shape the Future of AI Data Infrastructure?
Join us in building the platform that will enable the next generation of AI breakthroughs. If you're passionate about solving complex data challenges at scale and want to make a meaningful impact in healthcare and media AI, we'd love to hear from you.
_Koger Recruiting is conducting this search on behalf of our client. We are an equal opportunity recruiting firm committed to diversity and inclusion.
_