SRE Engineer
Role details
Job location
Tech stack
Job description
As a Site Reliability Engineer, you will play a key role in driving reliability, performance, and operational excellence across Ricoh's hybrid cloud and on-prem environments. You will help shape SRE practices, support incident and problem management, embed automation, and ensure infrastructure operations meet the highest security and compliance standards, including ISO 27001.
This is a hands-on technical role with significant influence across engineering, architecture, security, and operational teams.
What you will be doing
- Delivering against SLIs, SLOs and managing error budgets for core services
- Implementing standards for availability, latency, performance, capacity, and scalability
- Leading and contributing to root-cause analysis and major incident reviews
- Supporting a blameless post-mortem culture with clear action tracking
- Defining and implementing SRE practices, tooling, and engineering standards
- Driving infrastructure-as-code and automation across Azure and on-prem
- Improving image bakery pipelines for secure, repeatable server builds
- Embedding observability using metrics, logs, traces, and effective alerting
- Ensuring all practices align with ISO 27001 and internal security frameworks
- Managing automated patching, vulnerability remediation and configuration compliance
- Building dashboards and KPIs for reliability, MTTR, change failure rate, capacity and operational trends
- Reducing operational toil through automation and improved tooling
- Supporting the delivery and evolution of the SRE roadmap aligned to Ricoh's transformation strategy
Requirements
- Proven experience in Site Reliability Engineering or Production Engineering within medium to large-scale environments
- Strong knowledge of Azure and on-prem infrastructure (IaaS, PaaS, networking, identity, storage)
- Hands-on experience with infrastructure-as-code (Terraform, ARM/Bicep), configuration management (Ansible, PowerShell DSC), and CI/CD tooling (Azure DevOps, GitHub Actions)
- Experience with monitoring and observability stacks
- Solid understanding of OS fundamentals (Windows/Linux), security, networking
- Background in scripting or software development (PowerShell, Python, Go)
- Experience with containers and orchestration (Docker, Kubernetes, AKS)
- Familiarity with ITSM practices and platforms such as ServiceNow
- Experience operating in ISO 27001 or similar regulated environments
Business & Interpersonal Skills
- Experience contributing to incident response, root-cause analysis and continuous improvement
- Ability to influence without direct authority and challenge the status quo
- Comfortable working with security, architecture, product and operational teams
- Strong communication skills with the ability to translate complex technical issues to non-technical audiences
Calm and effective in high-pressure situations, especially major incidents
Benefits & conditions
- A competitive salary package
- Industry leading benefits
Ricoh is an exceptional place to work. A place where there is strong emphasis on career development for the right individuals. This is a role where you can excel within a fast-paced environment and succeed within a thriving organisation.
This is an excellent opportunity to join a global company where you can truly capitalise and build on your own experience.