About BEKhealth
BEKhealth is transforming how clinical research is conducted by unlocking the full potential of healthcare data. Our AI-powered BEK platform processes structured and unstructured EMR data to match patients to clinical trials, accelerating access to new treatments and improving outcomes.
We integrate cutting-edge AI, NLP, and data engineering to build the data infrastructure that powers the future of clinical research.
The Role
We’re looking for a Data Processing Engineer to help expand and optimize BEKhealth’s data processing engine; working on the core system that ingests, processes, and transforms clinical data from diverse EMR sources to allow for the best patient matching in clinical trials and expanding clinical research operations.
This role is ideal for an engineer who thrives at the intersection of AI/LLM processing, NLP pipelines, data integration, and PHI anonymization. You’ll design and enhance large-scale ETL workflows, build anonymization and de-identification pipelines, and create tools that allow researchers to safely use de-identified datasets.
You’ll work closely with our AI, data science, and product teams to ensure data quality, privacy, and performance at scale.
What You’ll Do
- Develop and maintain BEKhealth’s data processing and ETL pipelines across multiple EMR integrations.
- Enhance AI/LLM and NLP pipelines that extract and normalize unstructured healthcare data.
- Design and implement PHI anonymization and de-identification frameworks to ensure HIPAA compliance
- Build internal dataset creation tools that allow researchers to securely generate and validate de-identified datasets.
- Collaborate with QA and data validation engineers to implement automated data quality checks.
- Optimize data ingestion, transformation, and normalization workflows for speed and accuracy.
- Work closely with data scientists to improve model input and output pipelines.
What We’re Looking For
- 3–6 years of experience in data engineering, NLP, or healthcare data processing.
- Proficiency in Python and SQL (must-have).
- Strong experience with ETL frameworks, data pipelines, and APIs (Airflow, dbt, Spark, or similar).
- Experience with PHI data handling, anonymization, and HIPAA compliance.
- Familiarity with NLP frameworks (spaCy, Hugging Face, transformers, or similar).
- Experience integrating with EMR systems or working with healthcare data standards (FHIR, HL7, OMOP).
- Familiarity with AWS, Kubernetes, python celery or other modern data orchestration / pipeline tools.
Why You’ll Enjoy Working Here
- Be part of a team turning messy, complex healthcare data into life-changing clinical insights.
- Collaborate with experts in AI, NLP, and health data engineering.
- Competitive salary and full benefits.
- Flexible remote work with unlimited PTO.
- Mission-driven culture where your work directly accelerates medical research.