Head of Support & Service Reliability Engineering

Sycurio
Guildford, United Kingdom
14 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Guildford, United Kingdom

Tech stack

API
Software as a Service
Distributed Systems
Payment Systems
Monitoring of Systems
Release Management
Reliability Engineering
Grafana
Mttr
Api Design

Job description

We are seeking a Head of Support & Service Reliability to lead and evolve our global support function into a proactive, platform-integrated reliability capability. This role provides an exciting and dynamic opportunity for an outcome-focused individual; as Sycurio is a critical inflection point as we transition from a single-tenant architecture to a multi-tenant SaaS platform, requiring a fundamental shift from reactive ticket handling to systemic reliability, observability, and customer experience management at scale. You will own the end-to-end operational integrity of the platform, ensuring availability, performance, and customer trust, while partnering closely with Engineering, Product, and Customer-facing teams; being a key contributor to our GRR goal of 90%+. Sycurio employs a strategic managed service provider who provides the people, tooling, and day-to-day execution across all support tiers. The Head of Support sets the standards, governs vendor performance, and ensures every aspect of the support experience - from incident response to customer satisfaction - meets enterprise-grade expectations. Key Responsibilities

Service Reliability & Platform Stability

Own platform availability, performance, and reliability across all tenants Reduce incident frequency, severity, and blast radius Establish and drive Service Reliability Engineering (SRE) principles Ensure scalability and operational readiness of a multi-tenant platform

Incident Management & Response

Implement and lead a structured incident management framework (P1-P4) Act as executive owner of major incidents (P1/P2) Drive improvements in:

Mean Time to Detect (MTTD) Mean Time to Resolve (MTTR)

Ensure clear, consistent internal and external communication during incidents

Observability & Monitoring

Define and implement a comprehensive observability strategy, including technical telemetry (infrastructure, application, APIs) Business telemetry (transactions, payment success rates, usage) End-to-end customer journey visibility Ensure issues are detected proactively, not customer-reported Partner with Product and Engineering to embed telemetry into the platform

Support Operations (L1-L3)

Lead global support teams ensuring high-quality, SLA-driven case management Define and enforce support processes, tooling, and performance standards Improve key metrics:

First response time Resolution time Reopen rate Escalation quality

Platform Operations & Change Management

Oversee operational aspects of the platform, including release management and deployment safety, ensuring all releases are observable, reversible, and low-risk Change control processes Environment consistency across staging and production Own the visibility and continuous improvement of delivery and recovery performance using the DORA metrics, in partnership with Engineering

Issue Management & Root Cause Discipline

Establish rigorous Root Cause Analysis (RCA) standards Identify and eliminate systemic issues (not just symptom fixes) Track and reduce recurring incidents Feed insights into Product and Engineering roadmaps

Customer Experience & Commercial Alignment

Align support with Customer Success and Sales Ensure coordinated communication during incidents Protect customer relationships during critical events Introduce tenant-aware impact assessment (ARR, strategic accounts, regulatory exposure) Support enterprise-grade expectations for transparency and reliability

Cross-functional Leadership

Act as the bridge between Engineering, Product, & Customer Delivery / Success Embed supportability and operational readiness into pre-sales (Stage 4/5 governance), product development, and deployment processes

Managed Service Governance

Chair regular operational reviews and quarterly business reviews with the managed service leadership team Own the managed service scorecard - defining KPIs, reviewing performance data, and driving accountability for misses Manage contract compliance, SLA adherence, and commercial exposure from managed service underperformance Lead continuous improvement programs jointly with the managed service provider, including tooling upgrades, process redesigns, and training investments Maintain an escalation path for systemic or persistent managed service failure, up to and including remediation planning

Requirements

10+ years in Support, Platform Operations, or SRE leadership roles Proven experience in multi-tenant SaaS and legacy environments Strong understanding of distributed systems, incident management at scale, observability frameworks Track record of building and scaling high-performing operational teams Experience in outsourced or hybrid operational models Experience working cross-functionally with Engineering and Product

Preferred

Background in payments, security, or compliance-driven environments (e.g., PCI) Experience with API-first platforms and telephony/payment flows Familiarity with observability tools (e.g., Grafana, etc.)

Apply for this position