Data Engineer
Role details
Job location
Tech stack
Job description
Python Spark Notebooks Apache Airflow Data Architecture: Delta tables Parquet format Medallion architecture (bronze, silver, gold layers) Lakehouses and warehouses within the Fabric ecosystem Visualization & Automation: Power BI and Power Automate (used by analysts) Preferred Platforms: Databricks preferred over Azure Data Factory, Team members are expected to: Manage data ingestion Maintain medallion architecture Support data consumption through gold layer outputs Collaborate across ingestion, transformation, and analytics
Top Skills Details
Full stack data engineers with focus on data ingestion Spark, Python, Delta Microsoft Fabric
Requirements
- Clarity/Caboodle
- any of these : OMOP, CDISC, i2b2, FHIR Data Models
- a good understanding /experience in : De-identification & Cohort Building, Synthetic Data Creation
- Use of various accelerators/convertors for Data Engineering Tasks, Preferred Experience: Familiarity with Azure stack Experience with Epic and SQL Server Background in healthcare data
Benefits & conditions
Funding Source: Texas Legislature Platform: Microsoft Fabric, configured in an Azure environment Architecture: Already defined and in place Current Status: Several datasets have been onboarded Expanding scope to support UT Real Health AI Initiative Transitioning to ingest data from all Health-Related Institutions (HRIs) in the UT system
Data Ingestion Strategy Objective: Ingest Epic Caboodle data from multiple UT Health institutions Considerations: Each institution uses a different version of Epic Caboodle although all are based on SQL Databases. Each HRI has unique data structures A flexible ingestion process is needed to accommodate these differences Timeline: Targeting December for initial data ingestion