Staff Cyber Resilience Engineer
Role details
Job location
Tech stack
Job description
You will work with a high-caliber engineering team, have direct influence on our security architecture, and lead recovery exercises that test the organization end-to-end. What You'll Do
Own Our Recovery Architecture
- Design and build our Isolated Recovery Environment - a hardened AWS account with immutable vaults that break the attacker's kill chain before it reaches our data.
- Threat model our environment with a deep understanding of cloud-native attack patterns: IAM privilege escalation, backup deletion, ransomware persistence, and lateral movement across accounts.
- Validate and continuously improve backup configurations to ensure recoverability, not just existence.
Standardize and Automate Infrastructure
- Lead our transition to 100% Infrastructure as Code. Every asset (VPCs, IAM roles, security groups) must be defined in Terraform so we can redeploy the entire stack into a clean account via automated pipeline.
- Build automated recovery workflows that can tear down a compromised environment and bootstrap a fresh, hardened one from verified code and clean data.
- Write and maintain executable recovery playbooks that detail the exact API calls and CLI commands needed to restore the application - tested, versioned, and runnable, not static documents.
Validate, Test, and Lead Exercises
- Develop automated scripts (Python or Go) to smoke test recovered data and validate integrity post-restoration.
- Lead regular hands-on recovery drills that simulate total loss of a critical environment and full recovery into a secondary clean account. Own the after-action process and drive improvements.
Drive Engineering Standards
- Act as the resilience authority for the engineering organization - shaping high-availability architecture decisions, influencing design reviews, and raising the floor on how we think about recoverability.
- Partner with the Site Reliability Engineering team on multi-region deployments and high-availability design, ensuring cyber resilience is embedded in architecture from the start.
- Champion IaC and immutable infrastructure practices across teams, not just within your own workstream.
Requirements
- 8+ years of experience in complex cloud environments (any of AWS/GCP/Azure), including at least 3 years in AWS. EKS/Kubernetes experience is a strong plus.
- Strong Terraform skills. You should be able to modularize complex environments so they are environment-agnostic.
- Hands-on familiarity with the Secure Vault pattern: protecting data in a separate, highly restricted AWS account with tight network controls.
- Advanced shell scripting and proficiency in either Python or Go to automate restoration tasks that native AWS tooling doesn't cover.
- Experience with CI/CD tooling (Scalr, GitHub Actions, or equivalent) to enable broad adoption of recovery pipelines across the organization.
- Proven ability to engineer and automate end-to-end restoration workflows.
Preferred
- Hands-on experience leading technical recovery efforts from an actual cyber attack or destructive incident.
- Experience with chaos engineering tooling to stress-test recovery assumptions.
- Familiarity with NIST SP 800-34 (Contingency Planning) or similar frameworks.
- AWS Security Specialty certification or equivalent demonstrated expertise.
Benefits & conditions
The estimated base salary range for new hires into this role is $205,000- $233,000 annually + annual bonus depending on factors such as job-related skills, relevant experience, and location. We also offer a competitive benefits package, including 401(k) match, medical, dental and vision insurance; life and disability insurance; generous paid time off including vacation, sick leave, floating and fixed holidays, maternity and bonding leave; EAP, other wellbeing resources; and much more.