Lead Data Engineer
Role details
Job location
Tech stack
Job description
The SimpliFi Data Pool is a central data repository hosted on Azure, designed to unify OE financial data and enhance cost efficiency. It serves as the primary source of accounting and reporting data, ensuring compliance with mandatory reporting requirements. The Data Pool is a key component of the SimpliFi program, integrated with SAP, providing a harmonized data model with Common Core components, including a globally harmonized Chart of Accounts and standardized accounting and reporting processes.
The ecosystem optimizes finance data management through key components: structured data storage (Staging Area, Raw Data Vault, Business Data Vault), customized feeder system integrations, seamless SAP connectivity via Azure functions, and a PowerBI-enabled reporting layer for in-depth analysis and data lineage transparency. It features Master and Reference Data Management (Informatica IDMC), rigorous data quality checks, and the SimpliFi Studio interface for user-friendly manual uploads and pipeline monitoring, ensuring data accuracy, reliability, and operational efficiency.
What you do
- Define and enforce data engineering standards - pipeline design patterns, naming conventions, modelling approaches (e.g. Data Vault). Make architecture decisions and translate business/domain requirements into technical solutions.
- Design and oversee end-to-end data pipelines: ingestion, transformation, loading. In SimpliFi terms, this means owning flows across Raw Data Vault, Business Data Vault, UJT, and CORE layers - often spanning tools like Databricks, Azure Synapse, and IDMC.
- Accountable for data quality at the pipeline level - defining checks, monitoring and resolving data issues before they propagate downstream.
- Guide junior/mid engineers, sets the bar for code quality, and conducts reviews. A key multiplier for team capability.
- Work closely with Data Modellers, MDM/RDM specialists, Data Architects, and IT component leads. Bridge the gap between technical implementation and business intent.
- Ensure pipelines meet data governance requirements (lineage, classification, access controls) and support gate processes where relevant.
- Act as a technical proxy in planning sessions, helping size effort, flag dependencies and unblock the team during sprint/delivery cycles.
Requirements
Do you have experience in Terraform?, Do you have a Master's degree?, * Hands-on expertise building and operating large-scale pipelines - batch and streaming. Proficiency in tools like Apache Spark, Kafka, Airflow and cloud-native equivalents (ADF, Glue, Dataflow). Experience with ETL/ELT patterns at enterprise scale. Practical experience with modern lakehouse/warehouse platforms -Databricks, Azure Synapse, or equivalents. Understands partitioning, clustering, query optimisation.
- Deep working knowledge of at least one major cloud Azure - compute, storage, networking, managed services. In enterprise contexts like SimpliFi, Azure is typically dominant (ADLS, ADF, Azure Databricks).
- Solid grasp of modelling paradigms - relational, dimensional (Kimball), and Data Vault 2.0. Ability to review and contribute to models, not just implement them. Strong SQL (complex transformations, performance tuning), Python (pipeline logic, data quality scripts), and familiarity with Scala or PySpark for distributed workloads.
- Experience implementing DQ rules, profiling, and monitoring - ideally with tools like Acceldata, Great Expectations, or Monte Carlo.
- Understanding of master and reference data flows - how golden records are created, maintained, and consumed. Experience with platforms like SAP MDG or Informatica MDM is a strong plus.
- CI/CD for data pipelines (GitHub Actions, Azure DevOps), version control discipline, infrastructure-as-code basics (Terraform), and containerisation (Docker/Kubernetes awareness).
- Able to read and contribute to architecture documents - understands layers, zones and data flow patterns across a platform. Can engage meaningfully with architects without needing hand-holding. Know how to implement metadata capture, data lineage (e.g. via IDMC or Purview), and access controls. Experience navigating governance frameworks and gate processes
- Familiarity with Machine learning is a strong advantage.
Benefits & conditions
- We offer a hybrid work model which recognizes the value of striking a balance between in-person collaboration and remote working incl. up to 25 days per year working from abroad.
- We believe in rewarding performance and our compensation and benefits package includes a company bonus scheme, pension, employee shares program and multiple employee discounts (details vary by location).
- From career development and digital learning programs to international career mobility, we offer lifelong learning for our employees worldwide and an environment where innovation, delivery and empowerment are fostered.
- Flexible working, health and wellbeing offers (including healthcare and parental leave benefits) support to balance family and career and help our people return from career breaks with experience that nothing else can teach.