GCP Data Engineer

Harnham

Charing Cross, United Kingdom

2 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Junior

Compensation

£ 169K

Job location

Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)

Big Data

Google BigQuery

Cloud Computing

Code Review

Data Files

ETL

Python

Machine Learning

Cloud Services

SQL Databases

Unstructured Data

Google Cloud Platform

PyTorch

Build Management

Machine Learning Operations

Data Pipelines

Job description

They're now looking for a GCP Data Engineer to join a multidisciplinary team responsible for building and operating robust, cloud-native data infrastructure that supports ML workloads, particularly PyTorch-based pipelines. The Role

You'll focus on designing, building, and maintaining scalable data pipelines and storage systems in Google Cloud, supporting ML teams by enabling efficient data loading, dataset management, and cloud-based training workflows.

You'll work closely with ML engineers and researchers, ensuring that large volumes of unstructured and structured data can be reliably accessed, processed, and consumed by PyTorch-based systems., * Design and build cloud-native data pipelines using Python on GCP

Manage large-scale object storage for unstructured data (Google Cloud Storage preferred)
Support PyTorch-based workflows, particularly around data loading and dataset management in the cloud
Build and optimise data integrations with BigQuery and SQL databases
Ensure efficient memory usage and performance when handling large datasets
Collaborate with ML engineers to support training and experimentation pipelines (without owning model development)
Implement monitoring, testing, and documentation to ensure production-grade reliability
Participate in agile ceremonies, code reviews, and technical design discussions

Requirements

Strong Python development experience
Hands-on experience with cloud object storage for unstructured data (Google Cloud Storage preferred; AWS S3 also acceptable)
PyTorch experience, particularly:

Dataset management
Data loading pipelines
Running PyTorch workloads in cloud environments We are not looking for years of PyTorch experience - one or two substantial 6-12 month projects is ideal

5+ years cloud experience, ideally working with large numbers of files in cloud buckets

Nice to Have

Experience with additional GCP services, such as:

Cloud Run
Cloud SQL
Cloud Scheduler

Exposure to machine learning workflows (not ML engineering)
Some pharma or life sciences experience, or a genuine interest in working with domain-specific scientific data

About the company

We're working with a global healthcare and AI research organisation at the forefront of applying data engineering and machine learning to accelerate scientific discovery. Their work supports large-scale, domain-specific datasets that power research into life-changing treatments.