GPU Systems Engineer

Selby Jennings
Charing Cross, United Kingdom
21 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 113K

Job location

Charing Cross, United Kingdom

Tech stack

Artificial Intelligence
Systems Engineering
C++
Configuration Management
Nvidia CUDA
Linux
Python
Network Layer
Performance Tuning
Remote Direct Memory Access
Ansible
Scripting (Bash/Python/Go/Ruby)
Hardware Testing
Operational Systems
Puppet

Job description

A global trading firm is seeking a GPU Engineer to help scale and evolve its high-performance computing (HPC) and AI research environment. This role sits within the Research and Development team and involves working on globally distributed infrastructure that supports trading and research operations around the clock.

You'll collaborate with experts across compute, storage, operating systems and automation to design and optimise large-scale GPU clusters, troubleshoot performance bottlenecks and automate deployment and monitoring across thousands of nodes. This is a high-impact role with broad scope, touching everything from hardware testing to performance engineering. Responsibilities

  • Design, build and optimise large-scale distributed GPU compute clusters
  • Identify and resolve performance bottlenecks across compute, storage and networking layers
  • Collaborate with research and engineering teams to profile, benchmark and fine-tune GPU-based workloads
  • Automate deployment, monitoring and troubleshooting across thousands of nodes
  • Support evolving workloads and infrastructure needs across trading and research teams
  • Own infrastructure projects from concept through implementation and support
  • Test and deploy new hardware and software, working closely with vendors to resolve complex issues

Requirements

  • 5+ years' experience in Linux systems engineering within HPC, AI or distributed infrastructure environments
  • Strong background in Linux system installation, performance tuning and troubleshooting
  • Expertise in diagnosing and optimising distributed GPU workloads
  • Deep understanding of GPU performance and tuning
  • Proficiency in Python scripting and automation frameworks
  • Experience with CUDA or C/C++ is a plus
  • Familiarity with NVIDIA technologies such as NCCL, GPUDirect RDMA and NVLink
  • Experience with configuration management tools (e.g. Salt, Ansible, Puppet, Chef)
  • Comfortable diagnosing issues across hardware, OS and network layers
  • Strong communication and organisational skills, with the ability to collaborate across technical teams
  • Thrive in fast-paced environments and motivated by high-impact work

About the company

The firm applies a scientific approach to trading and has built one of the world's most advanced computing environments for research and development. It values openness, collaboration and innovation, welcoming diverse perspectives and encouraging contributions from all team members. Whether writing elegant code, solving complex problems or sharing a meal, the culture is one of togetherness and continuous improvement.

Apply for this position