Data Platform Engineer

Block Labs

Municipality of Madrid, Spain

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Municipality of Madrid, Spain

Tech stack

Query Performance

API

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Automation of Tests

Business Intelligence

Google BigQuery

Databases

Continuous Integration

Data Validation

Data Infrastructure

Amazon DynamoDB

Identity and Access Management

Machine Learning

Operational Databases

Blockchain

Standard Sql

Data Streaming

Cloudformation

PySpark

Kubernetes

Data Lineage

Druid

Amazon Web Services (AWS)

Kafka

Apache Nifi

Spark Streaming

Presto

Vertica

Terraform

Data Pipelines

Serverless Computing

Redshift

Job description

reflects that shift. We are designing a new data platform that will act as the backbone for everything from real time decisioning to predictive modelling. As a Data Platform Engineer in the Data Team, you will own the end-to-end real-time pipeline, serving data across a unified analytical warehouse and feature-serving layer. You are not building dashboards. You are engineering the commercial nervous system of a multi-tenant platform designed to scale from one operator to 10x with marginal infrastructure cost. Key Responsibilities Design, build, and maintain scalable data pipelines using AWS Glue (PySpark), or equivalent orchestration and transformation tools. Engineer and optimise the ClickHouse warehouse for sub-second query performance across all back-offices. Implement data contracts between back-office and the platform. Onboarding a new operator is a config change, not new tables, topics, or feature views. Build the feature-serving layer providing pre-computed features to AI

Requirements

agents at millisecond latency. Integrate with third-party databases, back-office APIs, and external systems (CRM, affiliates, acquisition platforms). Establish monitoring, alerting, and maintenance procedures including pipeline health checks, freshness monitoring, anomaly detection, and data contract SLA enforcement. Own CI/CD and infrastructure-as-code for data workloads. Collaborate with data scientists, agent engineers, BI developers, and infrastructure teams to translate data requirements into reliable, production-grade pipelines. About You 3+ years building and operating production data pipelines at scale, with hands-on experience across both streaming and batch paradigms. Expertise in Apache Kafka (or Amazon MSK): topic design, consumer group management, offset handling, schema registry operations, and production troubleshooting of lag, rebalancing, and throughput issues. Strong SQL and warehouse engineering skills: experience with columnar analytical databases (ClickHouse strongly preferred, or similar: Druid, BigQuery, Redshift). PySpark / Spark Streaming proficiency: writing transformation jobs that normalise, enrich, and enforce business rules on event streams. Experience with AWS Glue, Apache Airflow, or Apache NiFi is a strong plus. Data modelling discipline: ability to design normalised, multi-tenant schemas where tenant isolation is a filter, not a fork. Experience with data contracts and schema governance. CI/CD and infrastructure-as-code experience: automated testing of data pipelines, version-controlled deployments (CloudFormation, Terraform, or CDK), and familiarity with containerised workloads (ECS Fargate or Kubernetes). Data quality and observability mindset: experience implementing pipeline health monitoring, automated data validation (Great Expectations or equivalent), freshness checks, and anomaly detection. Nice to Have Experience in iGaming, online casino, poker, or sportsbook platforms. Exposure to blockchain or crypto-native transaction flows, including on-chain event ingestion, token-denominated accounting, or stablecoin settlement. Comfortable operating in an AWS-native environment (MSK, Glue, S3, DynamoDB, ECS, IAM). You understand serverless tradeoffs and can size infrastructure for cost efficiency. Feature store experience (SageMaker, Feast, or Tecton) building offline/online feature pipelines that serve ML models at inference time. Prior work in regulated industries (financial services, gambling, fintech) where data lineage, auditability, and compliance are non-negotiable. Experience migrating legacy query engines (Athena, Trino, Presto) to modern analytical warehouses with reconciliation frameworks to validate correctness. How We Work Fully remote with asynch

About the company

About Block Labs Block Labs is a premier technology studio operating at the bleeding edge of Web3, Artificial Intelligence, and iGaming. We don't just ship features; we engineer high-scale, production-grade platforms that power the next generation of digital products. We are a collective of senior engineers, product strategists, and builders who refuse to compromise on architecture. Whether we are designing autonomous multi-agent AI systems, building decentralized financial infrastructure, or architecting high-frequency iGaming platforms, our standard is excellence. We move fast, but we build for the long term. If you are looking to work alongside a team that values deep technical expertise, thoughtful system design, and product ownership, Block Labs is where you belong. The Role Data & Intelligence now sits at the centre of several products we are developing, and we need a platform that is both dependable and capable of supporting more advanced intelligence over time. This role

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all