Data Engineer (Microsoft Fabric)
Role details
Job location
Tech stack
Job description
The Data Engineer will be working closely with the Business Optimization Team which is a team based in the Baarn office and responsible for all offices in Europe. This team advances the business through the implementation of solution sets using a combination of technology, data and process changes. The Data Engineer will report into the Global technology Infrastructure team which is based in New York. The Data Engineer role is responsible for the design, development, implementation, and operational support of Cerberus data tier within the Global technology Infrastructure team. The ideal candidate will be able to use appropriate programming languages to build robust, scalable pipelines that ingest, transform, and deliver financial data across domains. They'll work with structured and semi-structured data from disparate sources, communicating in different modalities, using tools from the Microsoft stack of technologies and an emphasis on the Fabric platform. This engineer must be able to work autonomously in a collaborative team environment. Written and oral communication skills will be paramount in understanding the complexity of the data landscape, as well as managing deliverables from different stakeholders. Responsibilities
- Lakehouse/Warehouse Design and Maintenance
- Architect and maintain Lakehouses using Microsoft Fabric's OneLake and Delta Lake technologies.
- Implement medallion architecture (Bronze, Silver, Gold layers) for scalable and modular data processing.
- Optimize storage formats (e.g., Parquet, Delta) for performance and cost-efficiency.
- Pipeline Development and Orchestration
- Build robust Data Factory pipelines or Fabric Dataflows Gen2 to ingest data from diverse sources (SQL, APIs, cloud storage, etc.).
- Schedule and orchestrate ETL/ELT workflows using Fabric Pipelines or notebooks.
- Implement incremental loads, schema evolution, and error handling mechanisms.
- Data Cleansing and Transformation
- Apply data quality rules and transformations using Spark notebooks, SQL scripts, or Dataflows.
- Standardize, enrich, and deduplicate data across ingestion layers.
- Ensure consistency and referential integrity across medallion layers.
- Monitoring and Observability
- Monitor pipeline execution, performance, and failures using Fabric's built-in telemetry and alerts.
- Set up automated alerts for pipeline failures, latency breaches, or data anomalies.
- Track resource utilization and optimize compute workloads.
- Governance and Compliance
- Implement data quality checks at ingestion and transformation stages.
- Maintain data lineage using Microsoft Purview integration for traceability and auditability.
- Enforce access controls, sensitivity labels, and data retention policies.
- Document data assets and transformations for transparency and collaboration.
- Collaboration and DevOps
- Use Infrastructure as Code (IaC) tools like Terraform to provision Fabric resources.
- Collaborate with analysts, data scientists, and business stakeholders to align data models with business needs.
- Integrate with Azure DevOps for CI/CD of pipelines and notebooks.
- Performance Tuning and Optimization
- Profile and tune Spark jobs and SQL queries for optimal performance.
- Partition and index data for faster access in Lakehouse queries.
- Manage compute capacity and scale workloads efficiently.
Requirements
Do you have experience in Terraform?, * 7+ years of experience in T-SQL and Python Development.
- 3+ years in Fabric/Databricks or related lakehouse platform with Apache Spark engine.
- Hands on experience with Data Lakehouse/Warehouse development and principles, including development in notebooks using Pyspark/SparkSQL.
- Excellent troubleshooting skills and ability to apply them under pressure
- Strong experience in logical/physical database designs
- Knowledge of cloud platform services for ETL/ELT
- Excellent written and oral communication skills
- Ability to learn and apply new technologies to the existing environment
- Ability to perform work with minimal guidance from the other team members
- Willingness to communicate and work with business stakeholders and understand their requirements
Preferred Skills:
- Experience with building pipelines and notebooks using PySpark/SparkSQL.
- Experience with Source Control platforms such as Git/Azure dev ops, their usage and principles of CI/CD.
- Knowledge of Microsoft Azure services including Azure SQL Databases, Managed SQL, Azure Synapse Analytics, Cosmos DB, Azure Data Factory, Databricks, Fabric and Power BI.
- Prior work experience in the financial sector, especially Capital Markets or Asset Management.
- Knowledge of the Real Estate sector and European markets is a plus.
- Certifications such as Azure Data Engineer Associate (DP-203), Fabric Data Engineer Associate (DP-700), Fabric Analytics Engineer Associate (DP-600) are a big plus.
Benefits & conditions
- Salary range: EUR 80,000 - EUR 120,000 gross per year, depending on relevant experience, plus a bonus of approximately 15%
- In addition to the gross salary, the employee is entitled to reimbursement of travel expenses (EUR 0.23 per kilometer), a working-from-home allowance (EUR 2.45 per day), and a net expense allowance based on job level (EUR 1,200 per year)
- 25 vacation days per year
- Disability insurance fully paid by the employer
- Employer-paid pension contribution (8%), including survivor's pension and ANW gap insurance
- The employment contract will initially be entered into for a fixed term of one year
- Restrictions apply with regard to personal investments
- As part of the recruitment process, a background screening must be successfully completed before the employment contract becomes effective
- Other contractual terms and conditions apply