Azure Databricks Engineer

Huxley Associates

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Charing Cross, United Kingdom

Tech stack

Airflow

Batch Processing

Continuous Integration

Information Engineering

Data Infrastructure

ETL

Data Vault Modeling

Relational Databases

Python

PostgreSQL

MongoDB

Online Analytical Processing

Online Transaction Processing

Operational Databases

Query Optimization

Reference Data

Azure

SQL Databases

Data Streaming

Data Processing

Azure

Database Optimization

Spark

Containerization

Data Lake

Data Lineage

Apache Flink

Production Code

Bicep

Kafka

Vertica

Terraform

Azure

Confluent

Databricks

Job description

Stream and batch processing patterns; late data handling, watermarking, and backfill strategies; throughput vs latency trade-offs in pipeline design
Production data observability - data lineage, quality checks, SLA monitoring, alerting on freshness and completeness; treating data correctness as a first-class concern
CI/CD for data infrastructure - version-controlled pipelines, automated data quality testing, reproducible and auditable deploys
Ability to work directly with quant researchers, risk managers, and traders - translate business requirements into reliable, well-documented data products
Nice to Have
Financial markets data - market data feeds (Bloomberg, Refinitiv), tick data, trade history, reference data, or instrument master management
Apache Spark or Flink for large-scale stream and batch processing beyond the Databricks ecosystem
dbt or equivalent SQL transformation layer; experience building and maintaining dbt projects in a production data warehouse
Event streaming with Kafka or Confluent Platform - topic design, consumer group management, exactly-once delivery guarantees
OLAP-optimised stores - ClickHouse, DuckDB, or equivalent; understanding of columnar storage and vectorised query execution
Energy, commodities, or broader financial markets domain knowledge
What We're Looking For
You treat data as a product, not a side effect. You know what it takes to make a pipeline trustworthy - not just running, but observable, tested, and recoverable when something upstream changes at 3am. You think in systems: schema evolution, lineage, freshness SLAs, and the downstream impact of every modelling decision. At ETrading , that data is the foundation of billion-dollar trading decisions. You are the reason it is right.

Requirements

6+ years data engineering in production environments; Python expertise - idiomatic, well-tested, production-grade code, not notebook scripts
ETL/ELT pipeline design and implementation at scale; orchestration with Airflow, Prefect, or equivalent; reliability-first mindset including backfill, retry, and exactly-once semantics
Azure data platform - Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage; infrastructure as code for data workloads (Terraform or Bicep)
Databricks - Delta Lake, Unity Catalog, job cluster vs interactive cluster trade-offs, cost-aware compute management, Spark job optimisation
Relational databases: PostgreSQL at production scale - query optimisation, indexing strategies, table partitioning, replication, schema design for both OLTP and analytical workloads
MongoDB - document modelling, aggregation pipelines, indexing strategy, replica sets; clear judgment on when document vs relational storage is the right architectural call
Containerisation: Docker and Kubernetes-based deployment of data workloads; reproducible, environment-agnostic data infrastructure
Data modelling for analytical workloads - dimensional modelling, data vault, or equivalent; schema evolution, slowly changing dimensions, and downstream impact analysis