💻 Software Engineer II, Data
📍 Remote East Coast
💸 Strong base + equity
We’re partnering with a rapidly scaling biotech company that’s using advanced machine learning to tackle some of the most complex challenges in drug discovery and development.
They’re seeking a Senior Data Engineer or Software Engineer with a strong data focus to play a foundational role in building and evolving the data platforms that power their AI-driven research efforts. This is a high-impact opportunity to influence core technical strategy, work closely with scientists and ML engineers, and help accelerate the discovery of next-generation therapeutics with the potential to make a real difference for patients.
Responsibilities:
- Build and continuously refine data pipelines that ingest and transform large, diverse datasets from both internal systems and external providers into high-quality inputs for machine learning models.
- Develop and advance the data storage and access layer to enable scalable analytics, controlled schema changes, reproducible datasets, and fast, reliable retrieval.
- Partner closely with machine learning engineers to enhance the efficiency, stability, and maintainability of Python-centric data processing and experimentation workflows.
Requirements:
- Significant professional experience in data engineering or a closely related field, typically demonstrated by one of the following:
a bachelor’s degree with ~8+ years of relevant work,
a master’s degree with ~6+ years,
a doctoral degree with ~3+ years
- Demonstrated track record of building robust, adaptable ETL and data processing systems that are easy to evolve and maintain.
- Experience developing and operating data pipelines using modern workflow orchestration or distributed processing platforms (e.g., workflow schedulers, cluster-based compute frameworks).
- Solid understanding of the machine learning development lifecycle; exposure to research, scientific computing, or ML-driven workflows is beneficial.
- Proven ability to work with very large datasets, including processing at multi-terabyte scale.
- Working knowledge of cloud infrastructure, particularly in AWS-like environments; familiarity with container orchestration platforms is a plus.
- Experience with modern data lake and table formats, such as columnar storage and metadata-driven lakehouse technologies.
- Strong proficiency in Python, with an emphasis on writing production-quality, well-structured code.
- Practical, solution-oriented mindset with the ability to balance technical tradeoffs and support rapid experimentation by ML or research teams.
- Prior experience in life sciences domains (e.g., bioinformatics, chemistry, or related fields) is advantageous but not required.
📧 Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at stefani.lukic@storm3.com
_⚡ Storm3 is a HealthTech & Biotech recruitment firm with clients across major Tech hubs in Europe, APAC and North America. To discuss open opportunities or career options, please visit our website at storm3.com and follow the Storm3 LinkedIn page for the latest jobs and int_el