Infrastructure Engineer - London
Role details
Job location
Tech stack
Job description
The role blends Linux systems administration (Ubuntu), containerized compute (LXD/LXC, some Docker), networking, and datacenter operations.
You will partner with engineering, network, and security teams to ensure reliability, performance, and change control in a 24x7, market-facing environment.
This is a production-oriented role: you'll prepare, review and execute changes, troubleshoot live issues, execute maintenance windows, and continuously improve our platform through automation and rigorous documentation.
Our Environment
- Servers: Dell, HPE, Supermicro.
- Storage: LVM, software and hardware RAID (mdadm, MegaRAID, LSI, ...).
- Containers: LXD/LXC (primary), some Docker.
- Networking (day-2 ops): VLANs, LACP, ACLS, routing basics; vendors include Dell, Supermicro, Arista, Juniper, VYOS.
- Applications & Data: MySQL, Elasticsearch, Kafka, Java, Apache HTTPD, ... Automation & laC: Git/GitLab, Ansible, Netbox, Chef, Terraform; scripting with Bash/Python.
- Monitoring/Observability: Centreon, Dynatrace.
What You'll Do
- Operate and improve Linux fleets (Ubuntu) in production.
- Manage HPC baremetal and LXD/LXC container platforms
- Provide level-3 incident response for infrastructure issues (systems, containers, network paths, storage), restoring service within SLAs and driving post-mortems.
- Own Platforms datacenter operations in Slough: rack/stack, cabling, optics, power planning, servers installation, console/OOB, manage inventory in Netbox, RMA logistics, and vendor coordination (Equinix Smart Hands, carriers, OEMs).
- Perform day-2 network operations on switches and firewalls (ACLS, VLANs, LAGS, routing basics), and collaborate closely with network engineering for changes
- Automate with Ansible Chef for configuration management and Terraform for laC on AWS where applicable. Build reliable tooling for repeatable ops (config generation, pre-change checks, deployments, and validation).
- Contribute to change management (runbooks, maintenance windows, rollback plans) and keep documentation current (network diagrams, inventories, SOPs).
- Participate in a Follow-the-Sun operations model, coordinating with your EMEA/APAC peers., Standard business hours aligned to Central European Time with flexibility for maintenance windows.
Rotational Weekend work (Friday/Saturday/Sunday) for planned changes and datacenter work; comp day granted during the week.
Requirements
o 2-3+ years operating Linux (Ubuntu, CentOS, RedHat) in production environments.
o This position requires occasional on-call availability outside of standard business hours to respond to urgent or critical operational issues. Flexibility to be contacted outside regular working hours is required.
o Previous datacenter work exposure: rack/stack, structured cabling (fiber/copper), PDUs, console/OOB, vendor/Smart Hands coordination, and accurate inventory. If no prior experience, willingness to learn and work in such environments.
o Containers: exposure to LXC or Docker in a production environment and their inner workings.
o Server hardware & storage: LVM, software RAID, MegaRAID tooling, firmware/BIOS/BMC (iDRAC/ILO/IPMI), and hands-on diagnostics and replacements.
o Networking fundamentals for day-to-day ops: VLANs, LACP, trunking, ACLs, static routes, BGP, DNS/DHCP, link/MTU issues; ability to execute well-scoped changes on Dell/Arista/Juniper/VYOS under peer review.
o Automation & SCM: Bash/Python, Git/GitLab; experience with Chef or Ansible or Puppet in production.
o Clear runbook-style writing, disciplined change control, and calm, structured troubleshooting under time pressure.
Nice to have:
o Familiarity with Equinix processes (cross-connects, tickets, remote hands) and carrier coordination.
o Ops exposure to Netbox, MySQL, Elasticsearch, Kafka, Java services, Apache; ability to collaborate with app teams on infra-adjacent issues.
o Experience with Centreon and Dynatrace (or equivalent monitoring/observability stacks).
o Config management/laC depth (Ansible, Puppet, Terraform modules, Secret management), and CI pipelines in GitLab.
o Deeper networking (EVPN/VXLAN, BGP, multicast) and/or traffic engineering.