Data Engineer

Stafide

Amsterdam, Netherlands

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Amsterdam, Netherlands

Artificial Intelligence

Azure

Cloud Computing

Information Engineering

Data Governance

Data Visualization

Distributed Data Store

Document-Oriented Databases

Python

Meta-Data Management

Data Streaming

Data Processing

Scripting (Bash/Python/Go/Ruby)

Data Ingestion

Azure

PySpark

Data Lineage

Data Management

Data Pipelines

Databricks

Design, build, and maintain scalable data pipelines using Databricks, PySpark, and Azure Data Factory.
Develop and optimize data ingestion, transformation, and loading processes for structured and unstructured data sources.
Implement data governance frameworks to ensure proper data ownership, compliance, and security standards.
Build and maintain reliable data quality validation processes to ensure accuracy, consistency, and completeness of enterprise data.
Establish data lineage tracking mechanisms to provide transparency on how data flows across systems and pipelines.
Develop and maintain Python scripts and automation tools to streamline data engineering workflows.
Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality datasets for analytics and AI initiatives., * Collaborate effectively with cross-functional teams including data scientists, analysts, and platform engineers.
Troubleshoot and optimize data pipeline performance, reliability, and cost efficiency in Azure environments.

What We Bring to the Table:

Opportunity to work on large-scale cloud data platforms and advanced data engineering solutions.
Exposure to modern data stack technologies including Databricks, Azure Data Factory, and distributed processing frameworks.
Collaborative environment working with data scientists, analytics teams, and engineering professionals.
Challenging projects focused on building scalable and enterprise-grade data platforms.
Opportunities to enhance expertise in data governance, data quality, and lineage frameworks

8+ years of experience in data engineering, data platform development, or big data environments.
Strong hands-on experience with Databricks and PySpark for distributed data processing.
Practical experience working with Azure Data Factory for orchestration and data pipeline development.
Advanced Python scripting skills for automation, transformation, and integration tasks.
Solid understanding of data governance principles, including metadata management, data cataloging, and compliance.
Experience implementing data quality frameworks and validation rules within data pipelines.
Hands-on experience with data lineage tracking and documentation practices.
Experience working with large-scale data processing frameworks and cloud-based data platforms.

You should possess the ability to:

Architect and implement end-to-end data pipelines that support large-scale data processing.
Design efficient data models and transformation logic using PySpark and Databricks.
Implement data governance and compliance practices within enterprise data ecosystems.
Identify and resolve data quality issues proactively through automated validation and monitoring.
Track and document data lineage across multiple data sources and transformation layers.
Write clean, scalable, and maintainable Python code for data engineering workflows.