AWS Databricks Engineer

Capgemini

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Charing Cross, United Kingdom

Tech stack

Airflow

Amazon Web Services (AWS)

Business Analytics Applications

Application Integration Architecture

Azure

Computer Programming

Databases

Continuous Integration

Data Architecture

Data Validation

Information Engineering

Data Governance

Data Infrastructure

ETL

Data Mapping

Data Mart

Data Warehousing

Database Queries

DevOps

Hadoop

Hive

Python

Query Optimization

Power BI

Azure

SQL Databases

Workflow Management Systems

Data Logging

Data Ingestion

Azure

Spark

Data Layers

Amazon Web Services (AWS)

Data Lake

PySpark

Semi-structured Data

Data Management

Cloud Migration

Api Design

Cloudwatch

Azure

Data Pipelines

Databricks

Job description

We are looking for an experienced AWS Databricks Engineer with strong hands-on expertise in Databricks, AWS cloud services, PySpark, Spark SQL, Delta Lake, Python, SQL, and data engineering. The candidate will be responsible for designing, developing, optimizing, and supporting scalable data pipelines and Lakehouse solutions for a banking client. The ideal candidate should have strong experience in building enterprise-grade data platforms, processing large volumes of structured and semi-structured data, and implementing secure, reliable, and high-performance data pipelines in AWS-based environments. Hybrid working: The places that you work from day to day will vary according to your role, your needs, and those of the business; it will be a blend of Company offices, client sites, and your home; noting that you will be unable to work at home 100% of the time. Your Role:

Define end-to-end data architecture for Azure-based data platforms using Azure Databricks, Azure Data Factory, ADLS Gen2, Delta Lake, Azure Synapse, and Power BI.
Design scalable and secure lakehouse architecture using bronze, silver, and gold data layers.
Lead architecture and design for data ingestion, transformation, curation, data marts, reporting, and analytics solutions.
Create high-level and low-level data architecture documents, data flow diagrams, integration architecture, and data platform blueprints.
Define architecture patterns for batch, incremental, real-time, and API-based data ingestion.
Design reusable data ingestion and transformation frameworks using ADF and Databricks.
Define data models for London Market Insurance data including policy, claims, premium, broker, bordereaux, delegated authority, reinsurance, exposure, and regulatory reporting data.
Work with business analysts and insurance SMEs to understand London Market business processes and translate requirements into data architecture.
Define standards for data modelling, source-to-target mapping, data quality, reconciliation, metadata, lineage, and auditability.
Design data governance, security, access control, and compliance frameworks for insurance data.
Support cloud migration, data warehouse modernisation, reporting transformation, and legacy system decommissioning initiatives.
Review technical designs, data models, ETL/ELT pipelines, and engineering implementation.
Provide architectural guidance to data engineers working on Azure Databricks, ADF, PySpark, SQL, and Delta Lake.
Collaborate with enterprise architecture, solution architecture, security, infrastructure, DevOps, and business teams.
Define CI/CD, DevOps, deployment, monitoring, and operational support architecture for data platforms.
Identify performance, scalability, reliability, and cost optimisation opportunities across Azure data services.
Support governance forums, architecture review boards, design authority meetings, and client stakeholder workshops.

Your Skills:

Strong hands-on experience with Databricks on AWS.
Strong experience with Apache Spark / PySpark.
Excellent programming skills in Python.
Strong SQL skills including complex queries, joins, CTEs, window functions, and query optimization.
AWS Secrets Manager - Secure secrets and credential management.
Amazon CloudWatch - Monitoring, logging, and alerting.
AWS Step Functions - Workflow orchestration, if applicable.
Amazon RDS / Aurora / Redshift - Source or target databases, where applicable.
Develop and maintain Databricks notebooks, workflows, jobs, and libraries.
Build reusable PySpark frameworks for ingestion, transformation, and data validation.
Implement Delta Lake features such as ACID transactions, schema evolution, time travel, and optimized storage.
Design and implement bronze, silver, and gold layers using medallion architecture.
Tune Databricks clusters for performance and cost optimization.
Monitor Databricks jobs and handle failures, retries, alerts, and job dependencies.
Implement job orchestration using Databricks Workflows, Airflow, AWS Step Functions, or similar tools.
Manage secrets, environment variables, and secure connections.
Support migration from legacy Hadoop/Spark platforms to Databricks on AWS, if required.

'We are a Disability Confident Employer: Capgemini is proud to be a Disability Confident Employer (Level 2) under the UK Government's Disability Confident scheme. As part of our commitment to inclusive recruitment, we will offer an interview to all candidates who:

Declare they have a disability, and
Meet the minimum essential criteria for the role.

Requirements

Do you have experience in Spark?

About the company

Capgemini ist einer der weltweit führenden Anbieter von Management- und IT-Beratung, Technologie-Services und Digitaler Transformation. Als ein Wegbereiter für Innovation unterstützt das Unternehmen seine Kunden bei deren komplexen Herausforderungen rund um Cloud, Digital und Plattformen.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all