Azure Data engineer with AI fabric experience

Wall Street Consulting Services
Warren, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Warren, United States of America

Tech stack

API
Artificial Intelligence
Amazon Web Services (AWS)
Unit Testing
Azure
Encodings
Data Vault Modeling
JSON
Query Optimization
Power BI
Azure
Standard Sql
Azure
Search Technologies
Data Streaming
Parquet
Azure
Large Language Models
Spark
GIT
Microsoft Fabric
Data Lake
PySpark
Key Vault

Requirements

1.Microsoft Fabric & lakehouse engineering Hands-on production experience with Microsoft Fabric One Lake, Lakehouse, Data Factory pipelines, Spark notebooks, and Direct Lake mode for Power BI.

Must have built and operated something real on Fabric Fluent in PySpark and Delta Lake: MERGE, schema evolution, time travel, OPTIMIZE, partitioning. Has built incremental ingestion from ADLS Gen2 or S3 into curated Delta tables. Strong SQL (window functions, CTEs, query optimization) and a working knowledge of Parquet internals, partitioning, predicate pushdown, and compaction.

2.Event-driven ingestion on Azure Has built production ingestion using Azure Event Hubs

3.Python engineering & Azure platform fundamentals Modular Python code with type hints, unit tests, and packaging.

Git-based workflow with pull requests and CI. Working knowledge of ADLS Gen2, Azure Active Directory, Key Vault, and Managed Identity permissions and the engineer need to handle them confidently from day one.

4.Source-to-target data modeling with a canonical layer Has built or contributed to a canonical model EDP, data vault, or dimensional that decouples source systems from downstream consumers. Understands medallion architecture (Bronze / Silver / Gold) and can explain why each layer exists and what belongs in it.

  1. Production LLM & embedding experience using an LLM API (Azure OpenAI strongly preferred for our stack) with structured output via JSON mode or function calling, paired with an embedding model for semantic search or matching.

Apply for this position