Resiliency Architect, Senior Manager
Role details
Job location
Tech stack
Job description
US Locations: USA - Hermitage; USA - Cincinnati; USA - Cleveland; USA - Nashville; USA - Pittsburgh; USA - Tampa
Deloitte Global is the engine of the Deloitte network. Our professionals reach across disciplines and borders to develop and lead global initiatives. We deliver strategic programs and services that unite our organization.
Work you'll do
We're seeking a Resiliency Architect to design, implement, and continuously improve our resiliency strategy for our global infrastructure.
You will lead the architecture for high availability, disaster recovery (DR), and business continuity-ensuring our critical workloads withstand failures, incidents, regional outages, and rapid demand shifts. This role partners closely with Platform Engineering, Products and Solutions, Security, and SRE/Operations stakeholders to meet stringent RTO/RPO targets.
The Resiliency Architect will be responsible for the following:
- Establishment of resilient architecture designs in collaboration with subject matter experts in respective disciplines including hosting (Cloud and on-prem), network, system management and application development.
- Assist with design of HA/FT patterns for distributed systems including active-active, active-passive, zone/region failover, as well as multi-cloud)
- Define and maintain DR Plans including run books, recovery tiers, data protection, and replication strategies, aligned to business impact analysis, RTO/RPO, and compliance requirements.
- Lead and manage regular resiliency testing and drive corrective actions
- Provide risk assessment and dependency mapping to architect solutions to minimize blast-radius and reduce + mitigate single points of failure.
- Ensure compliance with regulatory and industry frameworks (e.g., ISO 22301/27001, NIST SP 800-34/53, SOC 2) and internal risk policies.
Requirements
Do you have experience in System design?, Do you possess the following?:
- Key requirements include expertise in risk assessment, disaster recovery planning, infrastructure reliability, automation, and implementing pattern governance for system recovery
- System Resilience Modeling: Expertise in analyzing complex systems to eliminate single points of failure (SPOFs) and design of systems that can withstand load, attacks, and outages.
- Risk Management & Recovery: Deep knowledge of disaster recovery (DR) strategies, including defining Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
- Strategic Planning: Ability to create patterns for application architecture that adhere to resilience standards.
- Stakeholder Engagement: Ability to communicate technical risks, align resilience strategies with business continuity goals, and manage technical debt.
- 8+ years in architecture, platform engineering, or SRE, including 4+ years focused on availability/resiliency/DR.
- Hands-on experience with major CSP's (Azure, AWS, or GCP)
- Proven track record designing HA/DR for mission-critical applications (databases, servers, network and general infrastructure).
- Demonstrated experience leading DR testing efforts
- Excellent communication, stakeholder management, and documentation skills.
Preferred Certifications
- Certified Business Continuity Professional (CBCP) or ISO 22301 Lead Implementer