Senior Data Infrastructure Engineer - Build a (self-hosted) Greenfield Data Lake

i3D.net

Capelle aan den IJssel, Netherlands

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Capelle aan den IJssel, Netherlands

Tech stack

API

Airflow

Amazon Web Services (AWS)

Apache HTTP Server

Computing Platforms

Business Systems

Data Centers

Information Engineering

Data Infrastructure

Debian Linux

Linux

MariaDB

Open Source Technology

RabbitMQ

Raw Data

Ansible

Prometheus

Standard Sql

Spark

Data Lake

Kubernetes

Apache Flink

Bare Metal

Vertica

REST

Data Pipelines

Docker

Job description

You report to the Senior Engineering Manager.

Design the Data Lake architecture: Define the storage, ingestion, and transformation layers of systems for a self-hosted data platform, selecting the right open-source-first tools for each.
Build data pipelines: Create reliable pipelines that ingest data from across the company - infrastructure metrics, business systems, application logs, and financial data.
Model and transform data: Design schemas and transformation layers that make raw data usable, consistent, and queryable.
Integrate with existing systems: Connect to the company's current data sources (MariaDB, Prometheus, OpenSearch, internal APIs, and others) without disrupting production workloads.
Operate what you build: Own the reliability and performance of the data platform, including monitoring, alerting and capacity planning. Work closely with our Live Operations and Engineering teams to ensure it remains sustainable.
Collaborate across teams: Work with Platform, Infrastructure, Network, and Product teams to understand their data and make it accessible.
Document and share: Maintain clear documentation of the platform architecture, data catalog, and pipeline designs, so the foundation you build is understandable and extensible.

The data platform is greenfield - you'll have significant input into the final choices. As a starting point, we expect the stack to be self-hosted and open-source, in line with how i3D.net operates. Tools we'd expect you to evaluate and build on include:

Storage: MinIO (S3-compatible object storage), Apache Iceberg or Delta Lake
Processing: Apache Spark, Apache Flink
Orchestration: Apache Airflow
Query: Trino, ClickHouse
Data sources you'll integrate with: MariaDB, Prometheus, OpenSearch, RabbitMQ, internal & external REST APIs
Infrastructure: Linux (Debian), Docker, Kubernetes, Ansible
You've designed and deployed the core Data Lake architecture on our own infrastructure
Data from at least the major source systems is flowing into the platform reliably
There's a working transformation layer that makes raw data queryable and usable
Other teams can access and explore data without needing your help for every question
The platform is documented, monitored, and ready to support analytics use cases as they emerge, * Pragmatic architect: You make sound technical decisions, document your reasoning, and know when "good enough" beats "perfect".
Independent operator: You thrive with autonomy. You'll be the first data engineer - you need to drive your own roadmap with input from your manager and stakeholders.
Collaborative mindset: You work well across teams to understand data sources and make them accessible.
Remote vs Onsite: This is a hybrid role, so you'll spend some time working onsite at our Rotterdam office. If you're already in the Netherlands, great. If not, we're happy to support your move with our relocation services (a valid EU work permit is required).
Build from zero: This is a rare greenfield opportunity to design an entire data platform from the ground up, on real hardware, at global scale.
Full ownership: You'll make the architectural calls and see them through to production - no layers of approval or committee decisions.
Infrastructure, not abstractions: Work with bare metal, your own data centers, and open-source tooling - not cloud dashboards.
Competitive Perks: Annual bonus, 25 vacation days (excluding national holidays), travel allowance, and a solid pension plan.
Career Growth: Access education reimbursement, career guidance, and opportunities to upskill.
Stay Active: Free access to our in-house gym in Rotterdam.

Requirements

Data engineering depth: 6-8 years' building and operating data pipelines, storage layers, and transformation frameworks in production environments.
Open-source data stack experience: Hands-on with tools like Spark, Airflow, Trino, or similar - ideally self-hosted rather than managed cloud equivalents.
Greenfield builder: You have experience building data infrastructure from scratch, not just maintaining existing platforms.
Strong SQL and data modeling: You can design schemas that balance analytical flexibility with performance.
Infrastructure-aware: Comfortable with Linux, containers, Kubernetes, and operating your own services. You don't need a cloud console to get things done.

About the company

At i3D.net, we provide world-class global coverage with one of the most interconnected networks in the world. Our solutions focus on low latency, zero packet loss, and unmatched scalability, enabling seamless experiences for millions of users worldwide. With dedicated support, bespoke solutions, and cutting-edge technology, we deliver reliable, cost-efficient infrastructure that empowers game developers and businesses to scale effortlessly. Partnering with major names like Nvidia, DuckDuckGo, Discord, and Ubisoft, we are shaping the future of gaming and network technology. i3D.net runs infrastructure across more than 60 locations, serving millions of users, but today the data produced by our systems lives in silos. Infrastructure metrics, business transactions, application logs and financial data all exist, but none of them are aggregated, transformed or made available for decision making. As our first Senior Data Systems Engineer, you'll change that. You'll design and build i3D.net's Data Lake from scratch - a self-hosted, open-source-first data platform that brings all of this together into a single foundation. This is a greenfield build with real ownership: you'll make the architectural decisions, lay out the groundwork, and create the data infrastructure the company will rely on for years to come.