Cloud Reliability Test Engineer
Role details
Job location
Tech stack
Job description
As a Cloud Reliability Test Engineer, you will own the enterprise reliability testing strategy and governance across cloud services, setting standards and release criteria aligned to SLAs/SLOs and define a multi-year roadmap for resiliency, performance, and observability. You will establish organization-wide benchmarks and guardrails to ensure consistent test coverage and deployment gates, align cross-functional leaders on reliability goals and risk management for critical user journeys, and provide executive KPIs and scorecards that drive accountability for availability, latency, MTTR, and error budget adherence.
This role balances hands-on technical work (20%) with strategic leadership (80%). You will oversee incident readiness to ensure corrective actions deliver measurable improvements, evaluate and standardize tooling and reference architectures, and lead enablement and maturity uplift through playbooks, training, and a community of practice, influencing quarterly and annual planning with reliability posture, risk assessments, and ROI on reliability investments.
Initially, you will focus on establishing chaos testing capabilities, challenging cloud architecture designs, and mentoring QA teams in advanced testing practices. As these foundations mature, you will expand to own the enterprise reliability testing strategy, setting organization-wide standards, benchmarks, and release criteria aligned to SLAs/SLOs.
Requirements
TITLE: Cloud Reliability Test Engineer (Senior)
LEVEL: Senior (Mid-level+ to Senior range acceptable)
TYPE: Individual contributor, strategic leadership role
REPORTS TO: QA Director
5+ years QA experience AND 3+ years cloud/DevOps
Hands-on Terraform and Kubernetes
Public cloud (AWS or Google Cloud Platform) + on-prem exposure
Performance/load testing experience
Observability tools (Splunk, Datadog, AppDynamics)
Scripting (Python, Go, or Bash)
Strong communication and mentoring ability