Data Engineer - Baltimore City, MD
Role details
Job location
Tech stack
Job description
Join us in driving growth and seizing new business opportunities. Background Client is seeking a hands-on Data Engineer to design, develop, and optimize large-scale data pipelines in support of our Enterprise Data Warehouse (EDW) and Data Lake solutions. This role requires deep technical expertise in coding, pipeline orchestration, and cloud-native data engineering on AWS. The Data Engineer will be directly responsible for implementing ingestion, transformation, and integration workflows - ensuring data is high-quality, compliant, and analytics-ready. This role may support other projects or teams within MDH as needed. Role and Responsibilities Responsible for designing, building, and maintaining data pipelines and infrastructure to support data-driven decisions and analytics. The individual is responsible for the following tasks:
-
Design, develop and maintain data pipelines, and extract, transform, load (ETL) processes to collect, process and store structured and unstructured data
-
Build data architecture and storage solutions, including data lakehouses, data lakes, data warehouse, and data marts to support analytics and reporting
-
Develop data reliability, efficiency, and qualify checks and processes
-
Prepare data for data modeling
-
Monitor and optimize data architecture and data processing systems
-
Collaboration with multiple teams to understand requirements and objectives
-
Administer testing and troubleshooting related to performance, reliability, and scalability
H. Create and update documentation Hands-On Data Pipeline Development
- Design, code, and deploy ETL/ELT pipelines across bronze, silver, and gold layers of the Data Lakehouse.
- Build ingestion pipelines for structured (SQL), semi-structured (JSON, XML), and unstructured data using PySpark/Python programming language using AWS Glue or EMR.
- Implement incremental loads, deduplication, error handling, and data validation.Actively troubleshoot, debug, and optimize pipelines for scalability and cost efficiency.
EDW & Data Lake Implementation
- Develop dimensional data models (Star Schema, Snowflake Schema) for analytics and reporting.
- Build and maintain tables in Iceberg, Delta Lake, or equivalent OTF formats.Optimize partitioning, indexing, and metadata for fast query performance.
Healthcare Data Integration
- Build ingestion and transformation pipelines for EDI X12 transactions (837, 835, 278, etc.).
- Implement mapping and transformation of EDI data with FHIR and HL7 frameworks.Work hands-on with AWS Health Lake (or equivalent) to store and query healthcare data.
Data Quality, Security & Compliance
- Develop automated validation scripts to enforce data quality and integrity.
- Implement IAM roles, encryption, and auditing to meet HIPAA and CMS compliance standards.Maintain lineage and governance documentation for all pipelines.
Collaboration & Delivery
- Work closely with the Lead Data Engineer, analysts, and data scientists to deliver pipelines that support enterprise-wide analytics.
- Actively contribute to CI/CD pipelines, Infrastructure-as-Code (IaC), and automation.Continuously improve pipelines and adopt new technologies where appropriate.
Requirements
Do you have experience in Spark implementation?, Do you have a Master's degree?, * The candidate should have experience as data engineer or similar role with a strong understanding of data architecture and ETL processes. The candidate should be proficient in programming languages for data processing and knowledgeable of distributed computing and parallel processing.
- This position requires a bachelor's or master's degree from an accredited college or university with a major in computer science, statistics, mathematics, economics, or a related field. Three (3) years of equivalent experience in a related field may be substituted for the Bachelor's degree.
- 3+ years hands-on experience in building, deploying, and maintaining data pipelines on AWS or equivalent cloud platforms.
- Strong coding skills in Python and SQL (Scala or Java a plus).
- Proven experience with Apache Spark (PySpark) for large-scale processing.
- Hands-on experience with AWS Glue, S3, Redshift, Athena, EMR, Lake Formation.
- Strong debugging and performance optimization skills in distributed systems.
- Hands-on experience with Iceberg, Delta Lake, or other OTF table formats.
- Experience with Airflow or other pipeline orchestration frameworks.
- Practical experience in CI/CD and Infrastructure-as-Code (Terraform, CloudFormation).
- Practical experience with EDI X12, HL7, or FHIR data formats.
- Strong understanding of Medallion Architecture for data lake houses.
- Hands-on experience building dimensional models and data warehouses.
- Working knowledge of HIPAA and CMS interoperability requirements.