Hadoop Spark Data Engineer

Capgemini
Charing Cross, United Kingdom
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Azure
Big Data
Computer Programming
Data Integration
ETL
Data Security
Distributed Data Store
Distributed Systems
Hadoop
Hadoop Distributed File System
MapReduce
HBase
Hive
Performance Tuning
Standard Sql
SQL Databases
Sqoop
Data Streaming
Apache Yarn
Spark
Apache Flume
Kafka
Functional Programming
Stream Processing

Job description

  • Build operate monitor and troubleshoot Hadoop clusters.
  • Write scalable ETL processes using tools like Hive Pig and Spark.
  • Develop and maintain data ingestion processes using Sqoop Flume or Kafka.
  • Optimize MapReduce jobs and manage HDFS storage.
  • Collaborate with data scientists and analysts to support data needs.
  • Ensure data security and compliance with organizational policies.
  • Create and maintain technical documentation and playbooks.
  • Evaluate and integrate cloudbased big data solutions AWS GCP Azure.

Requirements

Do you have experience in Spark?, Results-driven Hadoop Spark Data Engineer with strong expertise in designing and implementing scalable big data solutions using Scala and Apache Spark. Experienced in working with the Hadoop ecosystem, including HDFS, Hive, and YARN, to process and analyze large datasets efficiently. Skilled in building robust ETL pipelines, real-time data processing, and optimizing distributed systems for performance and reliability. Proficient in SQL, data modeling, and integrating data from multiple sources in cloud and on-prem environments., * Proficient in Scala programming with strong expertise in functional programming concepts for building scalable data applications.

  • Extensive experience in Apache Spark (Core, SQL, and Streaming) for processing large-scale distributed data efficiently
  • Strong knowledge of Hadoop ecosystem components including HDFS, YARN, Hive, and HBase.
  • Skilled in designing and developing ETL pipelines and handling structured and unstructured big data.
  • Experienced in performance tuning, data optimization, and working with distributed systems in cloud or on-prem environments.

We are a Disability Confident Employer:

Capgemini is proud to be a Disability Confident Employer (Level 2) under the UK Government's Disability Confident scheme. As part of our commitment to inclusive recruitment, we will offer an interview to all candidates who:

  • Declare they have a disability, and
  • Meet the minimum essential criteria for the role.

About the company

Capgemini ist einer der weltweit führenden Anbieter von Management- und IT-Beratung, Technologie-Services und Digitaler Transformation. Als ein Wegbereiter für Innovation unterstützt das Unternehmen seine Kunden bei deren komplexen Herausforderungen rund um Cloud, Digital und Plattformen.

Apply for this position