Data Engineer

Stafide
Amsterdam, Netherlands
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Amsterdam, Netherlands

Tech stack

Artificial Intelligence
Azure
Cloud Computing
Information Engineering
Data Governance
Data Visualization
Distributed Data Store
Document-Oriented Databases
Python
Meta-Data Management
Data Streaming
Data Processing
Scripting (Bash/Python/Go/Ruby)
Data Ingestion
Azure
PySpark
Data Lineage
Data Management
Data Pipelines
Databricks

Job description

  • Design, build, and maintain scalable data pipelines using Databricks, PySpark, and Azure Data Factory.

  • Develop and optimize data ingestion, transformation, and loading processes for structured and unstructured data sources.

  • Implement data governance frameworks to ensure proper data ownership, compliance, and security standards.

  • Build and maintain reliable data quality validation processes to ensure accuracy, consistency, and completeness of enterprise data.

  • Establish data lineage tracking mechanisms to provide transparency on how data flows across systems and pipelines.

  • Develop and maintain Python scripts and automation tools to streamline data engineering workflows.

  • Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality datasets for analytics and AI initiatives., * Collaborate effectively with cross-functional teams including data scientists, analysts, and platform engineers.

  • Troubleshoot and optimize data pipeline performance, reliability, and cost efficiency in Azure environments.

What We Bring to the Table:

  • Opportunity to work on large-scale cloud data platforms and advanced data engineering solutions.

  • Exposure to modern data stack technologies including Databricks, Azure Data Factory, and distributed processing frameworks.

  • Collaborative environment working with data scientists, analytics teams, and engineering professionals.

  • Challenging projects focused on building scalable and enterprise-grade data platforms.

  • Opportunities to enhance expertise in data governance, data quality, and lineage frameworks

Requirements

  • 8+ years of experience in data engineering, data platform development, or big data environments.

  • Strong hands-on experience with Databricks and PySpark for distributed data processing.

  • Practical experience working with Azure Data Factory for orchestration and data pipeline development.

  • Advanced Python scripting skills for automation, transformation, and integration tasks.

  • Solid understanding of data governance principles, including metadata management, data cataloging, and compliance.

  • Experience implementing data quality frameworks and validation rules within data pipelines.

  • Hands-on experience with data lineage tracking and documentation practices.

  • Experience working with large-scale data processing frameworks and cloud-based data platforms.

You should possess the ability to:

  • Architect and implement end-to-end data pipelines that support large-scale data processing.

  • Design efficient data models and transformation logic using PySpark and Databricks.

  • Implement data governance and compliance practices within enterprise data ecosystems.

  • Identify and resolve data quality issues proactively through automated validation and monitoring.

  • Track and document data lineage across multiple data sources and transformation layers.

  • Write clean, scalable, and maintainable Python code for data engineering workflows.

Apply for this position