HPC AI Cloud Engineer

Wide Technology

Manchester, United Kingdom

3 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Manchester, United Kingdom

Tech stack

Build Automation

Bash

Cloud Computing

Profiling

Nvidia CUDA

InfiniBand

Python

Performance Tuning

Remote Direct Memory Access

Ansible

TensorFlow

PyTorch

Kubernetes

Infrastructure Automation Frameworks

Slurm

Terraform

Job description

Design and execute HPC & AI performance benchmarks (training, inference, scientific workloads)
Provision and optimize GPU/TPU-based infrastructure on GCP (A3/A4, TPU pods)
Analyze performance across frameworks (PyTorch, TensorFlow, JAX, CUDA, ROCm)
Identify system bottlenecks (compute, memory, network, I/O)
Build automation tools for benchmarking and reporting
Collaborate with teams to align workloads with optimal architecture

Requirements

Strong experience with GCP (Compute Engine, GKE, Storage, Networking)
Hands-on with NVIDIA (CUDA/NCCL), AMD (ROCm), and TPUs (XLA/JAX/TF)
Solid knowledge of HPC concepts (MPI, RDMA, InfiniBand, Slurm/Kubernetes)
Experience with performance benchmarks (MLPerf, HPL, NCCL, STREAM)
Proficiency in Python, Bash, and IaC tools (Terraform/Ansible)
Ability to analyze profiling tools (Nsight, TensorBoard, PyTorch Profiler)

Candidates will be required to go through background checks before commencing contract.

Must be eligible to live and work in the specified work location. Some occasional travel may be required. Only successful candidates will be contacted

About the company

World Wide Technology (WWT) is a global technology integrator and supply chain solutions provider. Through our culture of innovation, we inspire, build, and deliver business results, from idea to outcome. World Wide Technology UK is looking for a hands-on Cloud Engineer with strong expertise in HPC and AI/ML performance workloads on Google Cloud Platform (GCP). The role focuses on benchmarking, optimizing, and validating performance across advanced accelerator platforms including NVIDIA GPUs, AMD GPUs, and Google TPUs.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all