GPU Systems Engineer

Selby Jennings

Charing Cross, United Kingdom

21 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 113K

Job location

Charing Cross, United Kingdom

Tech stack

Artificial Intelligence

Systems Engineering

C++

Configuration Management

Nvidia CUDA

Linux

Python

Network Layer

Performance Tuning

Remote Direct Memory Access

Ansible

Scripting (Bash/Python/Go/Ruby)

Hardware Testing

Operational Systems

Puppet

Job description

A global trading firm is seeking a GPU Engineer to help scale and evolve its high-performance computing (HPC) and AI research environment. This role sits within the Research and Development team and involves working on globally distributed infrastructure that supports trading and research operations around the clock.

You'll collaborate with experts across compute, storage, operating systems and automation to design and optimise large-scale GPU clusters, troubleshoot performance bottlenecks and automate deployment and monitoring across thousands of nodes. This is a high-impact role with broad scope, touching everything from hardware testing to performance engineering. Responsibilities

Design, build and optimise large-scale distributed GPU compute clusters
Identify and resolve performance bottlenecks across compute, storage and networking layers
Collaborate with research and engineering teams to profile, benchmark and fine-tune GPU-based workloads
Automate deployment, monitoring and troubleshooting across thousands of nodes
Support evolving workloads and infrastructure needs across trading and research teams
Own infrastructure projects from concept through implementation and support
Test and deploy new hardware and software, working closely with vendors to resolve complex issues

Requirements

5+ years' experience in Linux systems engineering within HPC, AI or distributed infrastructure environments
Strong background in Linux system installation, performance tuning and troubleshooting
Expertise in diagnosing and optimising distributed GPU workloads
Deep understanding of GPU performance and tuning
Proficiency in Python scripting and automation frameworks
Experience with CUDA or C/C++ is a plus
Familiarity with NVIDIA technologies such as NCCL, GPUDirect RDMA and NVLink
Experience with configuration management tools (e.g. Salt, Ansible, Puppet, Chef)
Comfortable diagnosing issues across hardware, OS and network layers
Strong communication and organisational skills, with the ability to collaborate across technical teams
Thrive in fast-paced environments and motivated by high-impact work

About the company

The firm applies a scientific approach to trading and has built one of the world's most advanced computing environments for research and development. It values openness, collaboration and innovation, welcoming diverse perspectives and encouraging contributions from all team members. Whether writing elegant code, solving complex problems or sharing a meal, the culture is one of togetherness and continuous improvement.