Azure Databricks Engineer
Huxley Associates
Charing Cross, United Kingdom
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Charing Cross, United Kingdom
Tech stack
Airflow
Batch Processing
Continuous Integration
Information Engineering
Data Infrastructure
ETL
Data Vault Modeling
Relational Databases
Python
PostgreSQL
MongoDB
Online Analytical Processing
Online Transaction Processing
Operational Databases
Query Optimization
Reference Data
Azure
SQL Databases
Data Streaming
Data Processing
Azure
Database Optimization
Spark
Containerization
Data Lake
Data Lineage
Apache Flink
Production Code
Bicep
Kafka
Vertica
Terraform
Azure
Confluent
Databricks
Job description
- Stream and batch processing patterns; late data handling, watermarking, and backfill strategies; throughput vs latency trade-offs in pipeline design
- Production data observability - data lineage, quality checks, SLA monitoring, alerting on freshness and completeness; treating data correctness as a first-class concern
- CI/CD for data infrastructure - version-controlled pipelines, automated data quality testing, reproducible and auditable deploys
- Ability to work directly with quant researchers, risk managers, and traders - translate business requirements into reliable, well-documented data products
- Nice to Have
- Financial markets data - market data feeds (Bloomberg, Refinitiv), tick data, trade history, reference data, or instrument master management
- Apache Spark or Flink for large-scale stream and batch processing beyond the Databricks ecosystem
- dbt or equivalent SQL transformation layer; experience building and maintaining dbt projects in a production data warehouse
- Event streaming with Kafka or Confluent Platform - topic design, consumer group management, exactly-once delivery guarantees
- OLAP-optimised stores - ClickHouse, DuckDB, or equivalent; understanding of columnar storage and vectorised query execution
- Energy, commodities, or broader financial markets domain knowledge
- What We're Looking For
- You treat data as a product, not a side effect. You know what it takes to make a pipeline trustworthy - not just running, but observable, tested, and recoverable when something upstream changes at 3am. You think in systems: schema evolution, lineage, freshness SLAs, and the downstream impact of every modelling decision. At ETrading , that data is the foundation of billion-dollar trading decisions. You are the reason it is right.
Requirements
- 6+ years data engineering in production environments; Python expertise - idiomatic, well-tested, production-grade code, not notebook scripts
- ETL/ELT pipeline design and implementation at scale; orchestration with Airflow, Prefect, or equivalent; reliability-first mindset including backfill, retry, and exactly-once semantics
- Azure data platform - Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage; infrastructure as code for data workloads (Terraform or Bicep)
- Databricks - Delta Lake, Unity Catalog, job cluster vs interactive cluster trade-offs, cost-aware compute management, Spark job optimisation
- Relational databases: PostgreSQL at production scale - query optimisation, indexing strategies, table partitioning, replication, schema design for both OLTP and analytical workloads
- MongoDB - document modelling, aggregation pipelines, indexing strategy, replica sets; clear judgment on when document vs relational storage is the right architectural call
- Containerisation: Docker and Kubernetes-based deployment of data workloads; reproducible, environment-agnostic data infrastructure
- Data modelling for analytical workloads - dimensional modelling, data vault, or equivalent; schema evolution, slowly changing dimensions, and downstream impact analysis