Data Platform Engineer
Role details
Job location
Tech stack
Job description
reflects that shift. We are designing a new data platform that will act as the backbone for everything from real time decisioning to predictive modelling. As a Data Platform Engineer in the Data Team, you will own the end-to-end real-time pipeline, serving data across a unified analytical warehouse and feature-serving layer. You are not building dashboards. You are engineering the commercial nervous system of a multi-tenant platform designed to scale from one operator to 10x with marginal infrastructure cost. Key Responsibilities Design, build, and maintain scalable data pipelines using AWS Glue (PySpark), or equivalent orchestration and transformation tools. Engineer and optimise the ClickHouse warehouse for sub-second query performance across all back-offices. Implement data contracts between back-office and the platform. Onboarding a new operator is a config change, not new tables, topics, or feature views. Build the feature-serving layer providing pre-computed features to AI
Requirements
agents at millisecond latency. Integrate with third-party databases, back-office APIs, and external systems (CRM, affiliates, acquisition platforms). Establish monitoring, alerting, and maintenance procedures including pipeline health checks, freshness monitoring, anomaly detection, and data contract SLA enforcement. Own CI/CD and infrastructure-as-code for data workloads. Collaborate with data scientists, agent engineers, BI developers, and infrastructure teams to translate data requirements into reliable, production-grade pipelines. About You 3+ years building and operating production data pipelines at scale, with hands-on experience across both streaming and batch paradigms. Expertise in Apache Kafka (or Amazon MSK): topic design, consumer group management, offset handling, schema registry operations, and production troubleshooting of lag, rebalancing, and throughput issues. Strong SQL and warehouse engineering skills: experience with columnar analytical databases (ClickHouse strongly preferred, or similar: Druid, BigQuery, Redshift). PySpark / Spark Streaming proficiency: writing transformation jobs that normalise, enrich, and enforce business rules on event streams. Experience with AWS Glue, Apache Airflow, or Apache NiFi is a strong plus. Data modelling discipline: ability to design normalised, multi-tenant schemas where tenant isolation is a filter, not a fork. Experience with data contracts and schema governance. CI/CD and infrastructure-as-code experience: automated testing of data pipelines, version-controlled deployments (CloudFormation, Terraform, or CDK), and familiarity with containerised workloads (ECS Fargate or Kubernetes). Data quality and observability mindset: experience implementing pipeline health monitoring, automated data validation (Great Expectations or equivalent), freshness checks, and anomaly detection. Nice to Have Experience in iGaming, online casino, poker, or sportsbook platforms. Exposure to blockchain or crypto-native transaction flows, including on-chain event ingestion, token-denominated accounting, or stablecoin settlement. Comfortable operating in an AWS-native environment (MSK, Glue, S3, DynamoDB, ECS, IAM). You understand serverless tradeoffs and can size infrastructure for cost efficiency. Feature store experience (SageMaker, Feast, or Tecton) building offline/online feature pipelines that serve ML models at inference time. Prior work in regulated industries (financial services, gambling, fintech) where data lineage, auditability, and compliance are non-negotiable. Experience migrating legacy query engines (Athena, Trino, Presto) to modern analytical warehouses with reconciliation frameworks to validate correctness. How We Work Fully remote with asynch