Platform Reliability Engineer (Operations)
Role details
Job location
Tech stack
Job description
Ricoh are currently recruiting for a Platform Reliability Engineer (Operations) based in London who will focus on operating, stabilising, and continuously improving live platforms, ensuring services meet defined availability, performance, security, and compliance standards. It combines deep operational expertise with strong automation, Infrastructure as Code (IaC), and observability capability to reduce toil, improve recovery, and enable predictable service outcomes.
This role will lead M365 migrations across a range of organisations on-site, working with business stakeholders, application owners, security, architecture, and service teams to translate requirements into scalable, secure, and cloud-native or cloud-aligned solutions, with a strong focus on automation, standardisation, and reuse.
Ricoh transforms organisations, using innovative technologies and services enabling you as an individual to work smarter. This is what we call "empowering digital workplaces".
In fact the entire Ricoh workforce enjoys our pioneering and innovative ways of working. We like to call it: imagine. change., it's the ethos of our brand and how we drive positive change for ourselves and others. Our teams are embracing change, fostering new ways of working and we have never been more resolute in our mission - "you work for us, and we`ll work for you".
What you will be doing
- Deliver standards for availability, latency, performance, capacity, and scalability.
- Lead M365 migrations, act as a main point of contact and support a range of stakeholders.
- Take part in root-cause analysis and problem management for major incidents.
- Champion blameless post-mortem culture and ensure actions are tracked and closed.
- Drive infrastructure-as-code and automation across Azure and co-lo environments.
- Evolve image bakery pipeline for secure, repeatable server images.
- Embed observability using metrics, logs, traces, and alerting tools.
- Partner with SRE and helpdesk teams to deliver service.
- Oversee automated patching, vulnerability remediation, and configuration compliance.
- Introduce KPIs and dashboards for reliability, incident trends, MTTR, change failure rate, and capacity.
- Occasional travel to datacentres and offices to ensure projects and service meets business requirements.
Requirements
Do you have experience in Terraform?, * Strong practical knowledge of infrastructure, cloud, and operations, beyond task execution
- M365 migration experience, ideally within large or matrixed environments
- Strong background in Azure (IaaS, PaaS, networking, identity, storage) and on-prem data centre operations.
- Hands-on skills with infrastructure-as-code (Terraform, ARM/Bicep), configuration management (Ansible, PowerShell DSC), CI/CD pipelines (Azure DevOps, GitHub Actions).
- Experience with monitoring/observability tools, alert design, and dashboarding.
- Knowledge of networking, security, and OS fundamentals (Windows/Linux).
- Experience operating in ISO 27001 or similar regulated environments.
- Experience integrating with ITSM platforms (e.g., ServiceNow) and aligning with ITIL processes
- Understanding of business and application impact of infrastructure decisions
- Working knowledge of security, architecture, and vendor/commercial considerations to support informed decision-making
Benefits & conditions
- A competitive salary package
- Industry leading benefits
Ricoh is an exceptional place to work. A place where there is strong emphasis on career development for the right individuals. This is a role where you can excel within a fast-paced environment and succeed within a thriving organisation.
This is an excellent opportunity to join a global company where you can truly capitalise and build on your own experience.