Senior Data Engineer (Governance Focus)
Role details
Job location
Tech stack
Job description
We are looking for a Senior Data Engineer with a strong foundation in modern data engineering and DevOps, and a clear focus on data and AI governance. This role goes beyond traditional pipeline development and plays a critical part in establishing governance-by-design across our data and analytics platform.
In addition to building and operating reliable data pipelines, this role will design and build AI agents and automation workflows that operate on governed data assets. These agents will be used to automate platform operations, data quality checks, metadata enrichment, and governance workflows, with an emphasis on traceability, security, and responsible use.
While our current platform is centered on Azure and Microsoft Fabric, we welcome candidates with strong data engineering experience on AWS, Databricks, or other modern data platforms, provided their skills are transferable and they demonstrate strong data engineering fundamentals.
How This Role Is Different
This role is for a senior data engineer who wants to move beyond building pipelines alone and take ownership of how data and AI are governed, automated, and scaled. You will still build data pipelines, but you will also define governance patterns, build AI agents that automate platform workflows, and embed controls directly into CI/CD processes.
Key Responsibilities
Data Engineering Foundations
Design, build, and operate batch and streaming data pipelines on modern cloud data platforms.
Develop robust ETL/ELT processes using SQL, Python, and PySpark with strong error handling, monitoring, and cost awareness.
Implement layered / medallion data architectures and analytics-ready data models to support BI and AI workloads.
Partner with analysts and data scientists to deliver trusted, production-grade data assets.
Data & AI Governance (Primary Focus)
Lead the implementation and evolution of data governance practices using Microsoft Purview or comparable governance platforms.
Define and operationalize governance standards for data products, AI/ML features, and agent-based workflows.
Ensure governance is embedded into platform workflows rather than handled as an after-the-fact process.
Act as a de facto owner for data and AI governance patterns as standards continue to evolve.
AI Agents & Automation
Design and build AI agents and intelligent automation workflows that support data platform and governance operations.
Develop agents to assist with metadata enrichment, data quality checks, anomaly detection, and governance workflows.
Ensure AI agents operate only on approved, governed data sources with appropriate logging and auditability.
Contribute to evolving AI governance patterns, including agent lifecycle management and input/output traceability.
DevOps & Platform Enablement
Implement and maintain CI/CD pipelines for data and AI assets.
Promote infrastructure-as-code practices (Terraform, Bicep, or equivalent) for repeatable, governed environments.
Define environment promotion paths (dev, test, prod) with embedded governance and policy checks.
Requirements
- Bachelor's degree AND 5+ years of experience in data engineering or data platform engineering in a production environment.
- SQL and hands-on experience with Python and PySpark.
- Experience in data engineering on at least one major cloud or data platform (Azure, AWS, Databricks, or equivalent), with skills that are transferable across platforms.
- Experience with Git-based development and CI/CD practices.
- Experience with data governance concepts such as cataloging, lineage, data quality, and access control.
- Experience with engineering judgment and balancing speed, safety, and governance.
Preferred Qualifications
- Hands-on experience with Microsoft Purview or comparable data governance platforms.
- Experience building AI agents, LLM-based automation, or intelligent assistants for operational or platform workflows.
- Experience embedding governance controls into DevOps workflows (policy-as-code mindset).
- Familiarity with Microsoft Fabric or Databricks-centric architectures, and the ability to learn new platforms quickly.