Hadoop Spark Data Engineer

Capgemini

Charing Cross, United Kingdom

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)

Azure

Big Data

Computer Programming

Data Integration

ETL

Data Security

Distributed Data Store

Distributed Systems

Hadoop

Hadoop Distributed File System

MapReduce

HBase

Hive

Performance Tuning

Standard Sql

SQL Databases

Sqoop

Data Streaming

Apache Yarn

Spark

Apache Flume

Kafka

Functional Programming

Stream Processing

Job description

Build operate monitor and troubleshoot Hadoop clusters.
Write scalable ETL processes using tools like Hive Pig and Spark.
Develop and maintain data ingestion processes using Sqoop Flume or Kafka.
Optimize MapReduce jobs and manage HDFS storage.
Collaborate with data scientists and analysts to support data needs.
Ensure data security and compliance with organizational policies.
Create and maintain technical documentation and playbooks.
Evaluate and integrate cloudbased big data solutions AWS GCP Azure.

Requirements

Do you have experience in Spark?, Results-driven Hadoop Spark Data Engineer with strong expertise in designing and implementing scalable big data solutions using Scala and Apache Spark. Experienced in working with the Hadoop ecosystem, including HDFS, Hive, and YARN, to process and analyze large datasets efficiently. Skilled in building robust ETL pipelines, real-time data processing, and optimizing distributed systems for performance and reliability. Proficient in SQL, data modeling, and integrating data from multiple sources in cloud and on-prem environments., * Proficient in Scala programming with strong expertise in functional programming concepts for building scalable data applications.

Extensive experience in Apache Spark (Core, SQL, and Streaming) for processing large-scale distributed data efficiently
Strong knowledge of Hadoop ecosystem components including HDFS, YARN, Hive, and HBase.
Skilled in designing and developing ETL pipelines and handling structured and unstructured big data.
Experienced in performance tuning, data optimization, and working with distributed systems in cloud or on-prem environments.

We are a Disability Confident Employer:

Capgemini is proud to be a Disability Confident Employer (Level 2) under the UK Government's Disability Confident scheme. As part of our commitment to inclusive recruitment, we will offer an interview to all candidates who:

Declare they have a disability, and
Meet the minimum essential criteria for the role.

About the company

Capgemini ist einer der weltweit führenden Anbieter von Management- und IT-Beratung, Technologie-Services und Digitaler Transformation. Als ein Wegbereiter für Innovation unterstützt das Unternehmen seine Kunden bei deren komplexen Herausforderungen rund um Cloud, Digital und Plattformen.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all