ETL Data Engineer (Healthcare)
Role details
Job location
Tech stack
Job description
We are looking for a Senior Data ETL Engineer with strong expertise in AWS-based data engineering and healthcare claims processing. This role involves designing and managing large-scale, HIPAA-compliant data pipelines handling millions to hundreds of millions of claims records.
The candidate will act as a technical leader, working closely with analytics, clinical, compliance, and product teams., Claims Data Processing
- Process and validate EDI 837 transactions at scale
- Handle complete claims lifecycle workflows
- Work with multi-source healthcare data ingestion
ETL & Data Architecture
- Build scalable AWS Glue pipelines
- Design Iceberg-based data lakes
- Optimize Redshift data warehouse performance
Data Engineering
- Design and manage DynamoDB & PostgreSQL systems
- Optimize queries for large-scale datasets
Orchestration & Automation
- Build and maintain Airflow DAGs
- Implement CI/CD pipelines and automation
Data Quality & Governance
- Ensure data accuracy, lineage, and auditability
- Maintain compliance with healthcare regulations
Requirements
-
Strong experience with ANSI X12 EDI transactions: 837P, 837I, 837D
-
Knowledge of full claims lifecycle:
-
835 (ERA), 270/271 (Eligibility), 276/277 (Claim Status)
Experience with:
- ICD-10, CPT, HCPCS, NPI, Revenue Codes
Understanding of HIPAA 5010 compliance
Experience handling large-scale claims data (millions+), * AWS Glue (PySpark ETL pipelines)
- Amazon Redshift (data warehousing & performance tuning)
- Amazon Athena
- Amazon S3 & Lake Formation
Experience with:
- Apache Iceberg (schema evolution, partitioning, time travel)
- Amazon Kinesis (streaming ingestion)
- AWS Step Functions / Lambda
Programming & ETL
-
Strong in Python / PySpark
-
Experience building ETL/ELT pipelines at scale
-
Handling multi-format data:
-
EDI, JSON, CSV, XML, APIs, HL7 FHIR
Databases & SQL
-
Expert-level SQL:
-
Joins, CTEs, window functions, query optimization
Hands-on experience with:
- Amazon DynamoDB (GSI/LSI, single-table design)
- PostgreSQL (partitioning, indexing, stored procedures)
Orchestration & DataOps
-
Apache Airflow (MWAA) DAG development
-
dbt transformations, testing, modeling
-
CI/CD tools:
-
GitHub Actions / AWS CodePipeline, * Data quality tools (Great Expectations / AWS Deequ)
-
Data lineage & monitoring (CloudWatch, SNS)
Strong knowledge of:
- HIPAA / HITECH compliance
- Encryption (KMS), IAM access control