Data Engineer

Hri, Inc.
yesterday

Role details

Contract type
Temporary to permanent
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote

Tech stack

Airflow
Data Architecture
Information Engineering
Python
Microsoft SQL Server
Power BI
Azure
Parquet
Data Ingestion
Microsoft Power Automate
Azure
Fast Healthcare Interoperability Resources
Spark
Microsoft Fabric
PySpark
Epic Caboodle
Databricks
Data Generation

Job description

Python Spark Notebooks Apache Airflow Data Architecture: Delta tables Parquet format Medallion architecture (bronze, silver, gold layers) Lakehouses and warehouses within the Fabric ecosystem Visualization & Automation: Power BI and Power Automate (used by analysts) Preferred Platforms: Databricks preferred over Azure Data Factory, Team members are expected to: Manage data ingestion Maintain medallion architecture Support data consumption through gold layer outputs Collaborate across ingestion, transformation, and analytics

Top Skills Details

Full stack data engineers with focus on data ingestion Spark, Python, Delta Microsoft Fabric

Requirements

  • Clarity/Caboodle
  • any of these : OMOP, CDISC, i2b2, FHIR Data Models
  • a good understanding /experience in : De-identification & Cohort Building, Synthetic Data Creation
  • Use of various accelerators/convertors for Data Engineering Tasks, Preferred Experience: Familiarity with Azure stack Experience with Epic and SQL Server Background in healthcare data

Benefits & conditions

Funding Source: Texas Legislature Platform: Microsoft Fabric, configured in an Azure environment Architecture: Already defined and in place Current Status: Several datasets have been onboarded Expanding scope to support UT Real Health AI Initiative Transitioning to ingest data from all Health-Related Institutions (HRIs) in the UT system


Data Ingestion Strategy Objective: Ingest Epic Caboodle data from multiple UT Health institutions Considerations: Each institution uses a different version of Epic Caboodle although all are based on SQL Databases. Each HRI has unique data structures A flexible ingestion process is needed to accommodate these differences Timeline: Targeting December for initial data ingestion

Apply for this position