AWS Databricks Engineer

Capgemini
Charing Cross, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Airflow
Amazon Web Services (AWS)
Business Analytics Applications
Application Integration Architecture
Azure
Computer Programming
Databases
Continuous Integration
Data Architecture
Data Validation
Information Engineering
Data Governance
Data Infrastructure
ETL
Data Mapping
Data Mart
Data Warehousing
Database Queries
DevOps
Hadoop
Hive
Python
Query Optimization
Power BI
Azure
SQL Databases
Workflow Management Systems
Data Logging
Data Ingestion
Azure
Spark
Data Layers
Amazon Web Services (AWS)
Data Lake
PySpark
Semi-structured Data
Data Management
Cloud Migration
Api Design
Cloudwatch
Azure
Data Pipelines
Databricks

Job description

We are looking for an experienced AWS Databricks Engineer with strong hands-on expertise in Databricks, AWS cloud services, PySpark, Spark SQL, Delta Lake, Python, SQL, and data engineering. The candidate will be responsible for designing, developing, optimizing, and supporting scalable data pipelines and Lakehouse solutions for a banking client. The ideal candidate should have strong experience in building enterprise-grade data platforms, processing large volumes of structured and semi-structured data, and implementing secure, reliable, and high-performance data pipelines in AWS-based environments. Hybrid working: The places that you work from day to day will vary according to your role, your needs, and those of the business; it will be a blend of Company offices, client sites, and your home; noting that you will be unable to work at home 100% of the time. Your Role:

  • Define end-to-end data architecture for Azure-based data platforms using Azure Databricks, Azure Data Factory, ADLS Gen2, Delta Lake, Azure Synapse, and Power BI.
  • Design scalable and secure lakehouse architecture using bronze, silver, and gold data layers.
  • Lead architecture and design for data ingestion, transformation, curation, data marts, reporting, and analytics solutions.
  • Create high-level and low-level data architecture documents, data flow diagrams, integration architecture, and data platform blueprints.
  • Define architecture patterns for batch, incremental, real-time, and API-based data ingestion.
  • Design reusable data ingestion and transformation frameworks using ADF and Databricks.
  • Define data models for London Market Insurance data including policy, claims, premium, broker, bordereaux, delegated authority, reinsurance, exposure, and regulatory reporting data.
  • Work with business analysts and insurance SMEs to understand London Market business processes and translate requirements into data architecture.
  • Define standards for data modelling, source-to-target mapping, data quality, reconciliation, metadata, lineage, and auditability.
  • Design data governance, security, access control, and compliance frameworks for insurance data.
  • Support cloud migration, data warehouse modernisation, reporting transformation, and legacy system decommissioning initiatives.
  • Review technical designs, data models, ETL/ELT pipelines, and engineering implementation.
  • Provide architectural guidance to data engineers working on Azure Databricks, ADF, PySpark, SQL, and Delta Lake.
  • Collaborate with enterprise architecture, solution architecture, security, infrastructure, DevOps, and business teams.
  • Define CI/CD, DevOps, deployment, monitoring, and operational support architecture for data platforms.
  • Identify performance, scalability, reliability, and cost optimisation opportunities across Azure data services.
  • Support governance forums, architecture review boards, design authority meetings, and client stakeholder workshops.

Your Skills:

  • Strong hands-on experience with Databricks on AWS.
  • Strong experience with Apache Spark / PySpark.
  • Excellent programming skills in Python.
  • Strong SQL skills including complex queries, joins, CTEs, window functions, and query optimization.
  • AWS Secrets Manager - Secure secrets and credential management.
  • Amazon CloudWatch - Monitoring, logging, and alerting.
  • AWS Step Functions - Workflow orchestration, if applicable.
  • Amazon RDS / Aurora / Redshift - Source or target databases, where applicable.
  • Develop and maintain Databricks notebooks, workflows, jobs, and libraries.
  • Build reusable PySpark frameworks for ingestion, transformation, and data validation.
  • Implement Delta Lake features such as ACID transactions, schema evolution, time travel, and optimized storage.
  • Design and implement bronze, silver, and gold layers using medallion architecture.
  • Tune Databricks clusters for performance and cost optimization.
  • Monitor Databricks jobs and handle failures, retries, alerts, and job dependencies.
  • Implement job orchestration using Databricks Workflows, Airflow, AWS Step Functions, or similar tools.
  • Manage secrets, environment variables, and secure connections.
  • Support migration from legacy Hadoop/Spark platforms to Databricks on AWS, if required.

'We are a Disability Confident Employer: Capgemini is proud to be a Disability Confident Employer (Level 2) under the UK Government's Disability Confident scheme. As part of our commitment to inclusive recruitment, we will offer an interview to all candidates who:

  • Declare they have a disability, and
  • Meet the minimum essential criteria for the role.

Requirements

Do you have experience in Spark?

About the company

Capgemini ist einer der weltweit führenden Anbieter von Management- und IT-Beratung, Technologie-Services und Digitaler Transformation. Als ein Wegbereiter für Innovation unterstützt das Unternehmen seine Kunden bei deren komplexen Herausforderungen rund um Cloud, Digital und Plattformen.

Apply for this position