Developer

Roleyou
Bramley, United Kingdom
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Bramley, United Kingdom

Tech stack

API
Azure
Code Generation
Software Quality
Data Validation
Data Governance
ETL
Data Masking
Data Systems
Distributed Computing Environment
Hive
Infrastructure as a Service (IaaS)
JSON
Jinja (Template Engine)
Python
NoSQL
Raw Data
Reference Data
Mockito
Kusto Query Language
Azure
Data Streaming
YAML
Enterprise Data Management
Parquet
Datadog
Data Processing
Spark
Azure
Indexer
Gitlab
Microsoft Fabric
Pytest
Data Lake
PySpark
Gitlab-ci
Integration Tests
Cosmos DB
Data Pipelines
Azure
Artifactory

Job description

The RoleYou will be part of a specialist engineering team responsible for designing, building, and optimising end-to-end financial instrument mastering pipelines. These pipelines span ingestion, normalisation, bi-temporal processing, and publication into enterprise data platforms.You will work closely with data architects, domain experts, and QC engineers to deliver scalable, reliable, and high-performance data solutions across Azure and Microsoft Fabric ecosystems.Key ResponsibilitiesBuild and maintain PySpark-based data pipelines for financial instrument mastering across multiple data sourcesDesign and implement bi-temporal data processing models (system time + valid time) including Slice, Resolve, Coalesce, and Diff logicDevelop optimised Azure Cosmos DB data models, including partitioning, indexing, change feed processing, and point-read optimisationIntegrate external APIs for entity resolution and matching services (PermID / IAAS) with robust retry and batching mechanismsDesign

Requirements

publication pipelines to convert bi-temporal data into uni-temporal outputs and publish via Microsoft Fabric / Parquet-based lakehouse architecturesImplement data quality frameworks using Great Expectations to ensure accuracy and complianceBuild robust unit and integration tests using PyTest for PySpark and Cosmos DB componentsSupport and maintain CI/CD pipelines (GitLab CI) including Python packaging, Artifactory deployment, and ARM-based infrastructure provisioningWork with YAML-driven configuration for mastering rules, schemas, and environment setupMonitor and troubleshoot production pipelines using Eventstream telemetry, KQL, and DataDog observability toolsDeliver scalable transformation logic, optimised aggregations, and high-performance data processing workflowsImplement data governance controls including data masking, role-based access, and compliance policiesContinuously tune and optimise workloads for performance, cost efficiency, and reliabilityRequired Skills & ExperienceStrong experience in Python and PySpark (Spark SQL, DataFrame API, Structured Streaming)Hands-on experience building large-scale ETL / streaming data pipelinesExperience working with Azure Cosmos DB (NoSQL) including data modelling and performance tuningStrong knowledge of Azure Data Lake Storage (ADLS / OneLake / ABFS)Experience implementing bi-temporal or SCD Type 2 data modelsStrong understanding of data quality frameworks (e.g., Great Expectations)Experience with CI/CD pipelines (GitLab / Azure DevOps) and automated deploymentsStrong testing discipline using PyTest, mocking, and integration testing approachesExperience working with YAML/JSON configuration and infrastructure-as-code (ARM templates)Strong understanding of distributed data processing and Spark-based architecturesExperience working with financial or time-series datasets (market data, reference data, risk data preferred)Strong communication skills and ability to work with cross-functional stakeholdersDesirable ExperienceMicrosoft Fabric (Notebooks, Eventstream, Lakehouses, Spark Job Definitions)Financial instrument/reference data (ISIN, CUSIP, LEI, PermID)Entity resolution / matching systems and enrichment APIsDelta Lake and Change Data Feed (CDF)Cosmos DB performance optimisation (RU tuning, bulk operations, concurrency)Jinja2 templating or code generation approachesSonarQube or similar code quality toolingMonorepo development with modern Python packaging tools (uv / Hatchling)Power BI / semantic modelling experienceKnowledge of financial compliance standards (GDPR, SOX)Technology StackPython 3.11+, PySpark 3.5, Spark SQL Azure Cosmos DB, ADLS, OneLake, Delta Lake, Parquet Microsoft Fabric (Eventstream, Notebooks, Lakehouse) Great Expectations, LSEG Data Validation frameworks GitLab CI/CD, JFrog Artifactory, ARM Templates DataDog, Eventstream, KQL monitoring Azure Key Vault, Azure CLI, Fabric APIsWhy JoinWork on a global financial markets transformation programmeHands-on with next-generation Azure + Fabric data platformsExposure to bi-temporal modelling and financial instrument mastering systemsHigh-impact engineering role with modern cloud and streaming architectureOpportunity to work with leading domain and technical experts in a regulated environment

Apply for this position