Data Engineer
Gravity Hair Salon, LLC
Columbus, United States of America
1 month ago
Role details
Contract type
Temporary to permanent Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 150KJob location
Columbus, United States of America
Tech stack
Airflow
Azure
Google BigQuery
Clinical Data Repository
Cloud Computing
Data Transformation
Data Systems
Data Warehousing
Python
SQL Databases
Feature Engineering
Fast Healthcare Interoperability Resources
Large Language Models
Snowflake
Health Level Seven International
Redshift
Databricks
Job description
- Design, build, and maintain scalable data pipelines for large, complex clinical datasets (EHR, pathology, genomics, etc.)? Implement and manage data transformations and analytics workflows using Databricks (Spark, Delta Lake)
- Ingest, standardize, and harmonize healthcare data into OMOP Common Data Model
- Partner with clinical, analytics, and ML teams to ensure data is reliable, well-documented, and fit for downstream use
- Lead data quality, validation, and observability efforts for clinical data pipelines
- Develop data models and schemas that support analytics, research, and ML use cases
- Optimize performance, cost, and reliability across the data platform
- Contribute to best practices around data governance, versioning, lineage, and reproducibility
- Taking data analysis requirements from commercial customers and mapping to clinical variables from the OMOP, Epic, or other data models
Requirements
We are seeking a Senior Data Engineer with deep experience working with clinical and real-world healthcare data. This role will focus on building and scaling data pipelines that support analytics, research, and downstream machine learning use cases. The ideal candidate has hands-on experience with OMOP, Databricks, and modern data stacks, and understands the real-world challenges of clinical data harmonization across disparate sources., * 5+ years of experience as a Data Engineer, with significant experience in healthcare or life sciences
- Strong hands-on experience with Databricks (Spark SQL, PySpark, Delta Lake)
- Deep understanding of OMOP CDM, including:
- Standard vocabularies (SNOMED, LOINC, RxNorm, ICD, CPT)
- ETL patterns for clinical data mapping and normalization
- Experience with clinical data harmonization, including:
- Mapping heterogeneous source systems into a common schema
- Managing missing, inconsistent, or conflicting clinical data
- Understanding clinical workflows and data provenance.
- Strong cloud experience, preferably in Azure, relating to items such as Data Factory and other data related tooling
- Proficiency in Python and SQL
- Experience with modern data stacks, including:
- Cloud data warehouses or lakehouses (Databricks, Snowflake, BigQuery, Redshift)
- Orchestration tools (Airflow, Dagster, Prefect)
- Data transformation frameworks (dbt or equivalent)
- Strong data modeling and analytics engineering skills
Preferred / Nice-to-Have
- Experience working with real-world evidence (RWE), clinical research, or regulatory-facing datasets
- Familiarity with ML or feature engineering pipelines built on clinical data
- Experience supporting downstream LLM, NLP, or ML workloads using healthcare data
- Knowledge of healthcare data standards beyond OMOP (FHIR, HL7)
- Experience operating data systems in HIPAA-compliant environments