Principal Data Engineer

Simplifi Data Pool
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote

Tech stack

Airflow
Azure
Cloud Engineering
Profiling
Software Quality
Continuous Integration
Information Engineering
Data Governance
ETL
Data Vault Modeling
Data Flow Control
Github
Python
Machine Learning
Performance Tuning
Query Optimization
Raw Data
Reference Data
Elearning
Standard Sql
Data Streaming
Spark
Containerization
PySpark
Kubernetes
Data Lineage
SAP MDG
Kafka
Terraform
Azure
Software Version Control
Data Pipelines
Docker
Databricks

Job description

Reference Data Management (Informatica IDMC), rigorous data quality checks, and the SimpliFi Studio interface for user-friendly manual uploads and pipeline monitoring, ensuring data accuracy, reliability, and operational efficiency. What you do - Define and enforce data engineering standards - pipeline design patterns, naming conventions, modelling approaches (e.g., Data Vault). Make architecture decisions and translate business/domain requirements into technical solutions. - Design and oversee end-to-end data pipelines: ingestion, transformation, loading. In SimpliFi terms, this means owning flows across Raw Data Vault, Business Data Vault, UJT, and CORE layers - often spanning tools like Databricks, Azure Synapse, and IDMC. - Accountable for data quality at the pipeline level - defining checks, monitoring and resolving data issues before they propagate downstream. - Guide junior/mid engineers, set bar for code quality, and conduct reviews. A key multiplier for team capability. -

Requirements

Work closely with Data Modellers, MDM/RDM specialists, Data Architects, and IT component leads. Bridge the gap between technical implementation and business intent. - Ensure pipelines meet data governance requirements (lineage, classification, access controls) and support gate processes where relevant. - Act as a technical proxy in planning sessions, helping size effort, flag dependencies and unblock the team during sprint/delivery cycles. What you bring - Hands-on expertise building and operating large-scale pipelines - batch and streaming. Proficiency in tools like Apache Spark, Kafka, Airflow and cloud-native equivalents (ADF, Glue, Dataflow). Experience with ETL/ELT patterns at enterprise scale. Practical experience with modern lakehouse/warehouse platforms -Databricks, Azure Synapse, or equivalents. Understands partitioning, clustering, query optimisation. - Deep working knowledge of at least one major cloud Azure - compute, storage, networking, managed services. In enterprise contexts like SimpliFi, Azure is typically dominant (ADLS, ADF, Azure Databricks). - Solid grasp of modelling paradigms - relational, dimensional (Kimball), and Data Vault 2.0. Ability to review and contribute to models, not just implement them. Strong SQL (complex transformations, performance tuning), Python (pipeline logic, data quality scripts), and familiarity with Scala or PySpark for distributed workloads. - Experience implementing DQ rules, profiling, and monitoring - ideally with tools like Acceldata, Great Expectations, or Monte Carlo. - Understanding of master and reference data flows - how golden records are created, maintained, and consumed. Experience with platforms like SAP MDG or Informatica MDM is a strong plus. - CI/CD for data pipelines (GitHub Actions, Azure DevOps), version control discipline, infrastructure-as-code basics (Terraform), and containerisation (Docker/Kubernetes awareness). - Able to read and contribute to architecture documents - understands layers, zones and data flow patterns across a platform. Can engage meaningfully with architects without needing hand-holding. Know how to implement metadata capture, data lineage (e.g. via IDMC or Purview), and access controls. Experience navigating governance frameworks and gate processes. - Familiarity with Machine learning is a strong advantage. What we offer - We offer a hybrid work model which recognizes the value of striking a balance between in-person collaboration and remote working incl. up to 25 days per year working from abroad. - We believe in rewarding performance and our compensation and benefits package includes a company bonus scheme, pension, employee shares program and multiple employee discounts (details vary by location). - From career development and digital learning programs to international career mobility, we offer lifelong learning for our employees worldwide and an environment where innovation, delivery and empowerment are fostered. - Flexible working, health and wellbeing offers

About the company

About the Job Envíe su solicitud a continuación después de leer todos los detalles y la información de apoyo sobre esta oportunidad de trabajo. The SimpliFi Data Pool is a central data repository hosted on Azure, designed to unify OE financial data and enhance cost efficiency. It serves as the primary source of accounting and reporting data, ensuring compliance with mandatory reporting requirements. The Data Pool is a key component of the SimpliFi program, integrated with SAP, providing a harmonized data model with Common Core components, including a globally harmonized Chart of Accounts and standardized accounting and reporting processes. The ecosystem optimizes finance data management through key components: structured data storage (Staging Area, Raw Data Vault, Business Data Vault), customized feeder system integrations, seamless SAP connectivity via Azure functions, and a PowerBI-enabled reporting layer for in-depth analysis and data lineage transparency. It features Master and

Apply for this position