Data Engineer (SC Cleared)

scrumconnect ltd

Newcastle upon Tyne, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Newcastle upon Tyne, United Kingdom

Tech stack

Artificial Intelligence

Airflow

Amazon Web Services (AWS)

Data analysis

Big Data

Cloud Computing

Computer Security

System Configuration

Continuous Integration

Directed Acyclic Graph (Directed Graphs)

Information Engineering

Data Governance

Data Stores

Data Warehousing

DevOps

Distributed Computing Environment

Identity and Access Management

Python

SQL Databases

Management of Software Versions

Workflow Management Systems

Jupyter Notebook

Data Processing

Spark

Gitlab

Containerization

PySpark

Gitlab-ci

Cloudwatch

Terraform

Software Version Control

Data Pipelines

Amazon Web Services (AWS)

Docker

Service Stack

Job description

A hands-on data engineering role within a large-scale cloud data programme, responsible for building, maintaining, and troubleshooting data pipelines using Apache Spark, PySpark, Apache Airflow, and a broad suite of AWS services. You will apply strong analytical and engineering skills to deliver trusted, well-governed data assets in a modern, cloud-native environment., You will work as a Data Engineer on a complex, cloud-based data programme - designing, building, and maintaining data pipelines that process large volumes of data across a modern AWS-native stack. Using Apache Spark and PySpark for distributed data processing, Apache Airflow for orchestration, and a range of AWS services for storage, compute, and analytics, you will help deliver reliable, well-governed data assets to downstream users., Build and maintain scalable data pipelines using Apache Spark and PySpark, processing and transforming large datasets across distributed cloud infrastructure.

Workflow orchestration

Configure and manage Apache Airflow DAGs for task orchestration, ensuring reliable scheduling, monitoring, and execution of data processing workflows.

Root cause analysis

Perform data analysis to identify and resolve root causes of pipeline failures and data quality issues - including reviewing EMR output logs and CloudWatch metrics.

Data modelling

Apply understanding of dimensional data models and slowly changing dimensions (SCD) to design and maintain well-structured, analytically trusted data assets.

Infrastructure as code

Provision and manage cloud infrastructure using Terraform. Containerise solutions using Docker and manage deployments through GitLab CI/CD pipelines and release tagging.

Security & encryption

Apply understanding of both server-side and client-side encryption patterns within AWS. Work within IAM policies and data governance standards appropriate to a regulated government environment.

Requirements

You will apply strong data analysis skills to identify root causes of data issues, work with dimensional data models and slowly changing dimensions, and implement infrastructure as code using Terraform. Familiarity with DWP engineering best practices and the ability to translate customer expectations into applied technical functionality are key to success in this role., Technical skills requiredLanguages & analytics

Python - primary language for pipeline development and data processing
SQL - used for querying, transformation, and validation across data stores
PySpark - for distributed data processing using Apache Spark on AWS EMR
Familiarity with basic data structures for constructing robust, scalable solutions

Data processing & orchestration

Apache Spark - understanding of distributed data processing architecture and execution
Apache Airflow - configuring DAGs and managing task orchestration at scale
Jupyter Notebooks - for exploratory data analysis and pipeline prototyping
Understanding of dimensional data models and slowly changing dimensions (SCD Types 1, 2, 3)
Data analysis skills to identify root cause of issues within pipelines and data assets

AWS services

Amazon EMR - running Spark workloads and reviewing output logs
Amazon Athena - ad hoc querying of data in S3
Amazon Textract and Comprehend - familiarity with AI/ML document extraction and NLP services
AWS S3, IAM, CloudWatch, EC2, ECR - core platform services used day-to-day
AWS console proficiency - navigating, configuring, and monitoring services
Understanding of server-side and client-side encryption within AWS

Infrastructure, DevOps & delivery

Terraform - Infrastructure as Code for provisioning and managing AWS environments
Docker - containerisation of data engineering solutions
GitLab - source code management, CI/CD pipeline configuration, release tagging, and component versioning
Familiarity with DWP engineering best practices
Ability to translate customer expectations into applied, functional technical solutions

Technology stack at a glance

About the company

Scrumconnect is a leading UK technology consultancy delivering digital transformation across public and private sectors, contributing to over 20% of the UK's major citizen-facing public services. We specialise in cloud engineering, data platforms, and agile delivery, helping clients build scalable, secure, and user-centred digital solutions that create real impact.