Hybrid Hardware & Software Support Engineer - HPC

Aeroficial Intelligence

Reading, United Kingdom

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Reading, United Kingdom

Tech stack

Artificial Intelligence

Bash

Configuration Management

Linux

General Parallel File Systems

Monitoring of Systems

Icinga

InfiniBand

Python

Kernel-Based Virtual Machine

Linux System Administration

Routing

OpenStack

Ansible

Prometheus

Subsystems

TCP/IP

Virtualization Technology

Ceph

Scripting (Bash/Python/Go/Ruby)

High Performance Computing

Grafana

GIT

Kubernetes

Information Technology

Slurm

Puppet

Docker

Job description

Primarily on-site at a customer facility near Reading, Berkshire, with occasional support for additional HPC installations across Europe., Bull's High-Performance Computing (HPC), Artificial Intelligence & Quantum Business Unit is seeking a Hybrid Hardware & Software Support Engineer to join our HPC Services team. This is a highly visible, customer-facing operational role supporting advanced HPC infrastructures in the UK. You will work across computing, storage, and networking layers, ensuring the deployment, stability, and performance of large-scale Linux-based systems. While prior HPC experience is an advantage, it is not mandatory - strong Linux and infrastructure engineers eager to grow into HPC & AI are encouraged to apply., Deployment & System Bring-Up

Install, configure, and integrate HPC cluster components (compute, storage, networking).
Perform system installation, initial configuration, and operational readiness checks.
Apply patches, updates, and conduct routine maintenance activities.

Hybrid Hardware & Software Support

Provide Level 1 and Level 2 operational support for HPC environments.
Diagnose and resolve issues involving:
Linux operating systems
Enterprise server hardware
High-speed interconnects
Storage subsystems
Conduct root cause analysis and implement corrective actions.
Escalate appropriately within the global support organisation when needed.

Operations & Incident Handling

Monitor system health and respond to incidents proactively.
Perform troubleshooting in secure, mission-critical environments.
Maintain detailed and accurate documentation of incidents and resolutions.

Customer Interface

Act as the primary technical contact on-site.
Communicate effectively regarding incidents, planned maintenance, and system status.
Build trusted relationships with customer technical stakeholders.
Represent Bull professionally in sensitive and high-profile environments.

Requirements

Strong Linux expertise (RedHat and/or Debian-based environments)
Solid understanding of enterprise server hardware (CPU, memory, storage, diagnostics)
Scripting skills in Bash and/or Python
Strong networking fundamentals (TCP/IP, routing, switching, security basics)
Hands-on experience with infrastructure deployment, configuration, and maintenance
Excellent troubleshooting and analytical abilities
Proactive mindset and ability to work independently

Desirable Skills & Experience Valuable, but not mandatory:

Experience with HPC clusters
High-speed networking (40/100GbE, InfiniBand)
Virtualisation technologies (KVM, OpenStack)
Storage systems (Ceph, SAN/NAS)
Parallel filesystems (Lustre, GPFS, BeeGFS)
Containers (Docker, Podman, Kubernetes)
Configuration management (Ansible, Puppet)
Monitoring and observability tools (Prometheus, Grafana, Icinga)
Workload managers (Slurm, PBS Pro)
Git version control, * Is hands-on, operationally focused, and detail oriented
Thrives in secure, mission-critical environments
Approaches troubleshooting methodically, even under pressure
Communicates clearly with both technical and non-technical stakeholders
Takes full ownership of incidents through to resolution
Is motivated to learn continuously and expand their technical expertise

Education & Experience Option 1:

Degree in Computer Science, Engineering, or related field + at least 2 years of relevant experience

Option 2:

5+ years of relevant industry experience

Strong early-career candidates with solid technical foundations will also be considered.

Benefits & conditions

Working on advanced HPC and digital infrastructure projects
Continuous learning and technical skill development
Career growth within a global technology organisation
Participation in internal initiatives and community-focused activities.

What happens next? Your application will be reviewed (1-2 business days) Short-listed candidates will be contacted for a discussion with HR Interview with management team Feedback (1-10 business days after the interview). Join us! Here, your ideas, your curiosity and your technical excellence directly shape the next era of advanced computing - unlocking enterprise value, accelerating scientific progress and driving positive impact for society.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all