Systems Research Engineer

European Tech Recruit

Edinburgh, United Kingdom

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Edinburgh, United Kingdom

Tech stack

Artificial Intelligence

Distributed Systems

Fault Tolerance

Systems Theories

Python

Rapid Prototyping Process

AI Infrastructure

Load Balancing

PyTorch

Large Language Models

TensorRT

Job description

Distributed Systems R&D: Architecting components for CPU, GPU, and NPU clusters with a focus on modularity and extreme scalability.
Performance Engineering: In-depth profiling of large-scale inference pipelines, specifically focusing on KV cache management and heterogeneous memory scheduling.
AI Serving: Optimising high-throughput frameworks (vLLM, Ray Serve, PyTorch Distributed) to ensure low-latency, multi-tenant performance.
Research Leadership: Contributing to top-tier venues (OSDI, NSDI, EuroSys, MLSys) and driving those innovations into real-world production.

Who You Are

We are looking for "systems-first" thinkers-engineers who understand what happens under the hood of a cluster.

Requirements

Education: A Bachelor's or Master's in CS, EE, or a related field (PhD highly preferred).
The Stack: Strong proficiency in C/C++ for systems work, with Python for rapid prototyping.
Expertise: Hands-on experience with LLM serving frameworks (vLLM, Ray Serve, TensorRT-LLM) and distributed algorithms.
Mindset: A solid grounding in systems research methodology and performance profiling tools.

The "Value Add" (Desired):

A PhD focused on distributed computing or AI infrastructure.
A track record of publications at major conferences (NeurIPS, ICML, ICLR, etc.).
Deep knowledge of load balancing, fault tolerance, and resource orchestration in massive AI clusters.

About the company

One of the largest telecommunications companies in the world is looking for an experienced researcher to join the company in Edinburgh. The Vision We are currently scaling a world-class research team in Edinburgh to redefine the foundational software stack for the LLM era. As AI transitions from experimental to "agentic" and "AI-native" infrastructure, we are building the super-node clusters and distributed architectures that will power the next generation of global data centres. This is a unique hybrid role positioned at the intersection of academic-grade systems research and industrial-scale engineering. You won't just be writing papers; you'll be prototyping and deploying the frameworks that manage GPU/NPU clusters at a massive scale.