Systems Research Engineers

Ai-native Infrastructure

Edinburgh, United Kingdom

1 month ago

Role details

Contract type

Permanent contract

Employment type

Part-time (≤ 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Edinburgh, United Kingdom

Tech stack

Artificial Intelligence

Distributed Systems

Fault Tolerance

Load Balancing

Job description

Distributed Systems Research & Development

Architect, implement, and evaluate distributed system components for emerging AI and data-intensive workloads.
Design modular and scalable infrastructure spanning heterogeneous clusters (CPU, GPU, accelerators).
Develop efficient serving and scheduling systems optimized for large-scale AI workloads.

Performance Optimization & Profiling

Conduct deep profiling and performance tuning of large-scale inference and data pipelines.
Optimize key-value cache management and heterogeneous memory scheduling.
Improve high-throughput inference serving using modern distributed ML frameworks.
Apply systematic performance analysis methodologies to identify bottlenecks and scalability constraints.

Scalable Model Serving Infrastructure

Develop frameworks enabling multi-tenant, low-latency, and fault-tolerant AI serving across distributed environments.
Research techniques for:

Cache sharing
Data locality optimization
Resource orchestration
Cluster-level scheduling

Prototype and evaluate new serving and inference architectures.

Research & Publications

Translate novel system designs into publishable research contributions at leading systems and ML venues.
Drive internal adoption of innovative methods and architectural improvements.

Cross-Team Collaboration

Communicate technical insights and evaluation results clearly to multidisciplinary engineering and research teams.
Collaborate across global research groups to align on long-term infrastructure strategy.

Requirements

We are seeking Systems Research Engineers with strong interest in computer systems, distributed AI infrastructure, and performance optimization. These roles are well suited to recent PhD graduates or outstanding BSc/MSc engineers aiming to develop research-driven engineering expertise in operating systems, distributed systems, AI model serving, and machine learning infrastructure., * Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.

Strong knowledge of:

Distributed systems
Operating systems
Machine learning systems
AI inference serving infrastructure

Hands-on experience with LLM serving frameworks and distributed cache optimization.
Proficiency in C/C++ for systems development.
Experience using Python for research prototyping.
Solid understanding of distributed algorithms and systems research methodology.
Familiarity with profiling and performance analysis tools.
Strong communication skills and collaborative mindset., * PhD in systems, distributed computing, or large-scale AI infrastructure.
Publications in top-tier systems or ML conferences.
Experience with: