Head of Support & Service Reliability Engineering

Sycurio

Guildford, United Kingdom

14 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Guildford, United Kingdom

Tech stack

API

Software as a Service

Distributed Systems

Payment Systems

Monitoring of Systems

Release Management

Reliability Engineering

Grafana

Mttr

Api Design

Job description

We are seeking a Head of Support & Service Reliability to lead and evolve our global support function into a proactive, platform-integrated reliability capability. This role provides an exciting and dynamic opportunity for an outcome-focused individual; as Sycurio is a critical inflection point as we transition from a single-tenant architecture to a multi-tenant SaaS platform, requiring a fundamental shift from reactive ticket handling to systemic reliability, observability, and customer experience management at scale. You will own the end-to-end operational integrity of the platform, ensuring availability, performance, and customer trust, while partnering closely with Engineering, Product, and Customer-facing teams; being a key contributor to our GRR goal of 90%+. Sycurio employs a strategic managed service provider who provides the people, tooling, and day-to-day execution across all support tiers. The Head of Support sets the standards, governs vendor performance, and ensures every aspect of the support experience - from incident response to customer satisfaction - meets enterprise-grade expectations. Key Responsibilities

Service Reliability & Platform Stability

Own platform availability, performance, and reliability across all tenants Reduce incident frequency, severity, and blast radius Establish and drive Service Reliability Engineering (SRE) principles Ensure scalability and operational readiness of a multi-tenant platform

Incident Management & Response

Implement and lead a structured incident management framework (P1-P4) Act as executive owner of major incidents (P1/P2) Drive improvements in:

Mean Time to Detect (MTTD) Mean Time to Resolve (MTTR)

Ensure clear, consistent internal and external communication during incidents

Observability & Monitoring

Define and implement a comprehensive observability strategy, including technical telemetry (infrastructure, application, APIs) Business telemetry (transactions, payment success rates, usage) End-to-end customer journey visibility Ensure issues are detected proactively, not customer-reported Partner with Product and Engineering to embed telemetry into the platform

Support Operations (L1-L3)

Lead global support teams ensuring high-quality, SLA-driven case management Define and enforce support processes, tooling, and performance standards Improve key metrics:

First response time Resolution time Reopen rate Escalation quality

Platform Operations & Change Management

Oversee operational aspects of the platform, including release management and deployment safety, ensuring all releases are observable, reversible, and low-risk Change control processes Environment consistency across staging and production Own the visibility and continuous improvement of delivery and recovery performance using the DORA metrics, in partnership with Engineering

Issue Management & Root Cause Discipline

Establish rigorous Root Cause Analysis (RCA) standards Identify and eliminate systemic issues (not just symptom fixes) Track and reduce recurring incidents Feed insights into Product and Engineering roadmaps

Customer Experience & Commercial Alignment

Align support with Customer Success and Sales Ensure coordinated communication during incidents Protect customer relationships during critical events Introduce tenant-aware impact assessment (ARR, strategic accounts, regulatory exposure) Support enterprise-grade expectations for transparency and reliability

Cross-functional Leadership

Act as the bridge between Engineering, Product, & Customer Delivery / Success Embed supportability and operational readiness into pre-sales (Stage 4/5 governance), product development, and deployment processes

Managed Service Governance

Chair regular operational reviews and quarterly business reviews with the managed service leadership team Own the managed service scorecard - defining KPIs, reviewing performance data, and driving accountability for misses Manage contract compliance, SLA adherence, and commercial exposure from managed service underperformance Lead continuous improvement programs jointly with the managed service provider, including tooling upgrades, process redesigns, and training investments Maintain an escalation path for systemic or persistent managed service failure, up to and including remediation planning

Requirements

10+ years in Support, Platform Operations, or SRE leadership roles Proven experience in multi-tenant SaaS and legacy environments Strong understanding of distributed systems, incident management at scale, observability frameworks Track record of building and scaling high-performing operational teams Experience in outsourced or hybrid operational models Experience working cross-functionally with Engineering and Product

Preferred

Background in payments, security, or compliance-driven environments (e.g., PCI) Experience with API-first platforms and telephony/payment flows Familiarity with observability tools (e.g., Grafana, etc.)

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all