AWS Databricks Engineer
Role details
Job location
Tech stack
Job description
We are looking for an experienced AWS Databricks Engineer with strong hands-on expertise in Databricks, AWS cloud services, PySpark, Spark SQL, Delta Lake, Python, SQL, and data engineering. The candidate will be responsible for designing, developing, optimizing, and supporting scalable data pipelines and Lakehouse solutions for a banking client. The ideal candidate should have strong experience in building enterprise-grade data platforms, processing large volumes of structured and semi-structured data, and implementing secure, reliable, and high-performance data pipelines in AWS-based environments. Hybrid working: The places that you work from day to day will vary according to your role, your needs, and those of the business; it will be a blend of Company offices, client sites, and your home; noting that you will be unable to work at home 100% of the time. Your Role:
- Define end-to-end data architecture for Azure-based data platforms using Azure Databricks, Azure Data Factory, ADLS Gen2, Delta Lake, Azure Synapse, and Power BI.
- Design scalable and secure lakehouse architecture using bronze, silver, and gold data layers.
- Lead architecture and design for data ingestion, transformation, curation, data marts, reporting, and analytics solutions.
- Create high-level and low-level data architecture documents, data flow diagrams, integration architecture, and data platform blueprints.
- Define architecture patterns for batch, incremental, real-time, and API-based data ingestion.
- Design reusable data ingestion and transformation frameworks using ADF and Databricks.
- Define data models for London Market Insurance data including policy, claims, premium, broker, bordereaux, delegated authority, reinsurance, exposure, and regulatory reporting data.
- Work with business analysts and insurance SMEs to understand London Market business processes and translate requirements into data architecture.
- Define standards for data modelling, source-to-target mapping, data quality, reconciliation, metadata, lineage, and auditability.
- Design data governance, security, access control, and compliance frameworks for insurance data.
- Support cloud migration, data warehouse modernisation, reporting transformation, and legacy system decommissioning initiatives.
- Review technical designs, data models, ETL/ELT pipelines, and engineering implementation.
- Provide architectural guidance to data engineers working on Azure Databricks, ADF, PySpark, SQL, and Delta Lake.
- Collaborate with enterprise architecture, solution architecture, security, infrastructure, DevOps, and business teams.
- Define CI/CD, DevOps, deployment, monitoring, and operational support architecture for data platforms.
- Identify performance, scalability, reliability, and cost optimisation opportunities across Azure data services.
- Support governance forums, architecture review boards, design authority meetings, and client stakeholder workshops.
Your Skills:
- Strong hands-on experience with Databricks on AWS.
- Strong experience with Apache Spark / PySpark.
- Excellent programming skills in Python.
- Strong SQL skills including complex queries, joins, CTEs, window functions, and query optimization.
- AWS Secrets Manager - Secure secrets and credential management.
- Amazon CloudWatch - Monitoring, logging, and alerting.
- AWS Step Functions - Workflow orchestration, if applicable.
- Amazon RDS / Aurora / Redshift - Source or target databases, where applicable.
- Develop and maintain Databricks notebooks, workflows, jobs, and libraries.
- Build reusable PySpark frameworks for ingestion, transformation, and data validation.
- Implement Delta Lake features such as ACID transactions, schema evolution, time travel, and optimized storage.
- Design and implement bronze, silver, and gold layers using medallion architecture.
- Tune Databricks clusters for performance and cost optimization.
- Monitor Databricks jobs and handle failures, retries, alerts, and job dependencies.
- Implement job orchestration using Databricks Workflows, Airflow, AWS Step Functions, or similar tools.
- Manage secrets, environment variables, and secure connections.
- Support migration from legacy Hadoop/Spark platforms to Databricks on AWS, if required.
'We are a Disability Confident Employer: Capgemini is proud to be a Disability Confident Employer (Level 2) under the UK Government's Disability Confident scheme. As part of our commitment to inclusive recruitment, we will offer an interview to all candidates who:
- Declare they have a disability, and
- Meet the minimum essential criteria for the role.
Requirements
Do you have experience in Spark?
About the company
Capgemini ist einer der weltweit führenden Anbieter von Management- und IT-Beratung, Technologie-Services und Digitaler Transformation. Als ein Wegbereiter für Innovation unterstützt das Unternehmen seine Kunden bei deren komplexen Herausforderungen rund um Cloud, Digital und Plattformen.