Data Engineer (SC Cleared)

scrumconnect ltd
Newcastle upon Tyne, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Newcastle upon Tyne, United Kingdom

Tech stack

Artificial Intelligence
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Data analysis
Big Data
Cloud Computing
Computer Security
System Configuration
Continuous Integration
Directed Acyclic Graph (Directed Graphs)
Information Engineering
Data Governance
Data Stores
Data Warehousing
DevOps
Distributed Computing Environment
Identity and Access Management
Python
SQL Databases
Management of Software Versions
Workflow Management Systems
Jupyter Notebook
Data Processing
Spark
Gitlab
Containerization
PySpark
Gitlab-ci
Cloudwatch
Terraform
Software Version Control
Data Pipelines
Amazon Web Services (AWS)
Docker
Service Stack

Job description

A hands-on data engineering role within a large-scale cloud data programme, responsible for building, maintaining, and troubleshooting data pipelines using Apache Spark, PySpark, Apache Airflow, and a broad suite of AWS services. You will apply strong analytical and engineering skills to deliver trusted, well-governed data assets in a modern, cloud-native environment., You will work as a Data Engineer on a complex, cloud-based data programme - designing, building, and maintaining data pipelines that process large volumes of data across a modern AWS-native stack. Using Apache Spark and PySpark for distributed data processing, Apache Airflow for orchestration, and a range of AWS services for storage, compute, and analytics, you will help deliver reliable, well-governed data assets to downstream users., Build and maintain scalable data pipelines using Apache Spark and PySpark, processing and transforming large datasets across distributed cloud infrastructure.

Workflow orchestration

Configure and manage Apache Airflow DAGs for task orchestration, ensuring reliable scheduling, monitoring, and execution of data processing workflows.

Root cause analysis

Perform data analysis to identify and resolve root causes of pipeline failures and data quality issues - including reviewing EMR output logs and CloudWatch metrics.

Data modelling

Apply understanding of dimensional data models and slowly changing dimensions (SCD) to design and maintain well-structured, analytically trusted data assets.

Infrastructure as code

Provision and manage cloud infrastructure using Terraform. Containerise solutions using Docker and manage deployments through GitLab CI/CD pipelines and release tagging.

Security & encryption

Apply understanding of both server-side and client-side encryption patterns within AWS. Work within IAM policies and data governance standards appropriate to a regulated government environment.

Requirements

You will apply strong data analysis skills to identify root causes of data issues, work with dimensional data models and slowly changing dimensions, and implement infrastructure as code using Terraform. Familiarity with DWP engineering best practices and the ability to translate customer expectations into applied technical functionality are key to success in this role., Technical skills requiredLanguages & analytics

  • Python - primary language for pipeline development and data processing
  • SQL - used for querying, transformation, and validation across data stores
  • PySpark - for distributed data processing using Apache Spark on AWS EMR
  • Familiarity with basic data structures for constructing robust, scalable solutions

Data processing & orchestration

  • Apache Spark - understanding of distributed data processing architecture and execution
  • Apache Airflow - configuring DAGs and managing task orchestration at scale
  • Jupyter Notebooks - for exploratory data analysis and pipeline prototyping
  • Understanding of dimensional data models and slowly changing dimensions (SCD Types 1, 2, 3)
  • Data analysis skills to identify root cause of issues within pipelines and data assets

AWS services

  • Amazon EMR - running Spark workloads and reviewing output logs
  • Amazon Athena - ad hoc querying of data in S3
  • Amazon Textract and Comprehend - familiarity with AI/ML document extraction and NLP services
  • AWS S3, IAM, CloudWatch, EC2, ECR - core platform services used day-to-day
  • AWS console proficiency - navigating, configuring, and monitoring services
  • Understanding of server-side and client-side encryption within AWS

Infrastructure, DevOps & delivery

  • Terraform - Infrastructure as Code for provisioning and managing AWS environments
  • Docker - containerisation of data engineering solutions
  • GitLab - source code management, CI/CD pipeline configuration, release tagging, and component versioning
  • Familiarity with DWP engineering best practices
  • Ability to translate customer expectations into applied, functional technical solutions

Technology stack at a glance

About the company

Scrumconnect is a leading UK technology consultancy delivering digital transformation across public and private sectors, contributing to over 20% of the UK's major citizen-facing public services. We specialise in cloud engineering, data platforms, and agile delivery, helping clients build scalable, secure, and user-centred digital solutions that create real impact.

Apply for this position