Principal Data Engineer

Simplifi Data Pool

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Tech stack

Airflow

Azure

Cloud Engineering

Profiling

Software Quality

Continuous Integration

Information Engineering

Data Governance

ETL

Data Vault Modeling

Data Flow Control

Github

Python

Machine Learning

Performance Tuning

Query Optimization

Raw Data

Reference Data

Elearning

Standard Sql

Data Streaming

Spark

Containerization

PySpark

Kubernetes

Data Lineage

SAP MDG

Kafka

Terraform

Azure

Software Version Control

Data Pipelines

Docker

Databricks

Job description

Reference Data Management (Informatica IDMC), rigorous data quality checks, and the SimpliFi Studio interface for user-friendly manual uploads and pipeline monitoring, ensuring data accuracy, reliability, and operational efficiency. What you do - Define and enforce data engineering standards - pipeline design patterns, naming conventions, modelling approaches (e.g., Data Vault). Make architecture decisions and translate business/domain requirements into technical solutions. - Design and oversee end-to-end data pipelines: ingestion, transformation, loading. In SimpliFi terms, this means owning flows across Raw Data Vault, Business Data Vault, UJT, and CORE layers - often spanning tools like Databricks, Azure Synapse, and IDMC. - Accountable for data quality at the pipeline level - defining checks, monitoring and resolving data issues before they propagate downstream. - Guide junior/mid engineers, set bar for code quality, and conduct reviews. A key multiplier for team capability. -

Requirements

Work closely with Data Modellers, MDM/RDM specialists, Data Architects, and IT component leads. Bridge the gap between technical implementation and business intent. - Ensure pipelines meet data governance requirements (lineage, classification, access controls) and support gate processes where relevant. - Act as a technical proxy in planning sessions, helping size effort, flag dependencies and unblock the team during sprint/delivery cycles. What you bring - Hands-on expertise building and operating large-scale pipelines - batch and streaming. Proficiency in tools like Apache Spark, Kafka, Airflow and cloud-native equivalents (ADF, Glue, Dataflow). Experience with ETL/ELT patterns at enterprise scale. Practical experience with modern lakehouse/warehouse platforms -Databricks, Azure Synapse, or equivalents. Understands partitioning, clustering, query optimisation. - Deep working knowledge of at least one major cloud Azure - compute, storage, networking, managed services. In enterprise contexts like SimpliFi, Azure is typically dominant (ADLS, ADF, Azure Databricks). - Solid grasp of modelling paradigms - relational, dimensional (Kimball), and Data Vault 2.0. Ability to review and contribute to models, not just implement them. Strong SQL (complex transformations, performance tuning), Python (pipeline logic, data quality scripts), and familiarity with Scala or PySpark for distributed workloads. - Experience implementing DQ rules, profiling, and monitoring - ideally with tools like Acceldata, Great Expectations, or Monte Carlo. - Understanding of master and reference data flows - how golden records are created, maintained, and consumed. Experience with platforms like SAP MDG or Informatica MDM is a strong plus. - CI/CD for data pipelines (GitHub Actions, Azure DevOps), version control discipline, infrastructure-as-code basics (Terraform), and containerisation (Docker/Kubernetes awareness). - Able to read and contribute to architecture documents - understands layers, zones and data flow patterns across a platform. Can engage meaningfully with architects without needing hand-holding. Know how to implement metadata capture, data lineage (e.g. via IDMC or Purview), and access controls. Experience navigating governance frameworks and gate processes. - Familiarity with Machine learning is a strong advantage. What we offer - We offer a hybrid work model which recognizes the value of striking a balance between in-person collaboration and remote working incl. up to 25 days per year working from abroad. - We believe in rewarding performance and our compensation and benefits package includes a company bonus scheme, pension, employee shares program and multiple employee discounts (details vary by location). - From career development and digital learning programs to international career mobility, we offer lifelong learning for our employees worldwide and an environment where innovation, delivery and empowerment are fostered. - Flexible working, health and wellbeing offers

About the company

About the Job Envíe su solicitud a continuación después de leer todos los detalles y la información de apoyo sobre esta oportunidad de trabajo. The SimpliFi Data Pool is a central data repository hosted on Azure, designed to unify OE financial data and enhance cost efficiency. It serves as the primary source of accounting and reporting data, ensuring compliance with mandatory reporting requirements. The Data Pool is a key component of the SimpliFi program, integrated with SAP, providing a harmonized data model with Common Core components, including a globally harmonized Chart of Accounts and standardized accounting and reporting processes. The ecosystem optimizes finance data management through key components: structured data storage (Staging Area, Raw Data Vault, Business Data Vault), customized feeder system integrations, seamless SAP connectivity via Azure functions, and a PowerBI-enabled reporting layer for in-depth analysis and data lineage transparency. It features Master and

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all