Data Engineer

Simplifi Data Pool
Municipality of Madrid, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote
Municipality of Madrid, Spain

Tech stack

Azure
Cloud Engineering
Profiling
Code Review
Continuous Integration
Data Governance
Data Integrity
ETL
Data Vault Modeling
Github
Python
Machine Learning
Performance Tuning
Scrum
Query Optimization
Raw Data
Reference Data
Standard Sql
Data Streaming
Spark
PySpark
Data Lineage
SAP MDG
Terraform
Azure
Software Version Control
Data Pipelines
Databricks

Job description

interface for manual uploads and pipeline monitoring, ensuring seamless operations, high data integrity, and informed decision-making. What you do Implement data pipelines in line with established engineering standards - pipeline design patterns, naming conventions, and modelling approaches (e.g. Data Vault). Apply architecture decisions made at Lead level and translate them into working, maintainable solutions. Build and maintain end-to-end data pipelines across ingestion, transformation, and loading. In SimpliFi terms, this means developing and owning flows across Raw Data Vault, Business Data Vault, UJT, and CORE layers - working hands-on with tools such as Databricks, Azure Synapse and IDMC. Responsible for data quality within their pipeline scope - implementing checks, identifying issues early, and resolving them before they propagate downstream. Escalate systemic or cross-domain issues to the Lead. Review code from junior and mid-level engineers, share best practices and contribute

Requirements

to raising team capability. Act as a day-to-day technical reference for less experienced colleagues. Work alongside Data Modellers, MDM/RDM specialists, and IT component leads. Translate technical constraints and findings into clear inputs for planning and design discussions. Ensure pipelines are built in line with data governance requirements - lineage, classification, and access controls. Support gate processes (e.g. DAB/CCP) by providing accurate technical evidence and documentation. Actively participate in sprint planning and delivery cycles - providing effort estimates, flagging technical dependencies, and keeping work moving within their delivery lane. What you bring Hands-on expertise building and operating large-scale pipelines - batch and streaming. Proficiency in tools like Apache Spark, Databricks and cloud-native equivalents. Experience with ETL/ELT patterns at enterprise scale. Practical experience with modern lakehouse/warehouse platforms - Databricks, Azure Synapse, or equivalents. Understanding of partitioning, clustering, query optimisation. Deep working knowledge of at least one major cloud Azure - compute, storage, networking, managed services. In enterprise contexts like SimpliFi, Azure is typically dominant (ADLS, ADF, Azure Databricks). Solid grasp of modelling paradigms - Medallion, dimensional (Kimball) and Data Vault 2.0. Ability to review and contribute to models, not just implement them. Know how to implement metadata capture, data lineage (e.g. via IDMC or Purview) and access controls. Experience navigating governance frameworks and gate processes. Strong SQL (complex transformations, performance tuning), Python (pipeline logic, data quality scripts) and familiarity with PySpark for distributed workloads. Experience implementing DQ rules, profiling and monitoring - ideally with tools like DQX, Great Expectations. Understanding of master and reference data flows - how golden records are created, maintained and consumed. Experience with platforms like SAP MDG or Informatica MDM is a strong plus. CI/CD for data pipelines (GitHub Actions), version control discipline and infrastructure-as-code basics (Terraform). Able to read and contribute to architecture documents and understand layers, zones and data flow patterns across a platform. Can engage meaningfully with architects without needing hand-holding. Familiarity with Machine learning is a strong advantage. What we offer We offer a hybrid work model which recognizes the value of striking a balance between in-person collaboration and remote working incl. up to 25 days per year working from abroad. We believe in rewarding performance and our compensation and benefits package includes a company bonus scheme, pension, employee shares program and multiple employee discounts (detai

About the company

About the Job The SimpliFi Data Pool is a central data repository hosted on Azure, designed to unify OE financial data and enhance cost efficiency. It serves as the primary source of accounting and reporting data, ensuring compliance with mandatory reporting requirements. The Data Pool is a key component of the SimpliFi program, integrated with SAP, providing a harmonized data model with Common Core components, including a globally harmonized Chart of Accounts and standardized accounting and reporting processes. The ecosystem is designed to streamline finance data management and reporting through key components: organized data storage (Staging Area, Raw Data Vault, Business Data Vault), tailored feeder system integrations, SAP connectivity via Azure functions, and a PowerBI-enabled consumption layer for detailed analysis and data lineage transparency. It incorporates robust Master and Reference Data Management (Informatica IDMC), rigorous data quality checks, and the SimpliFi Studio

Apply for this position