Principal Site Reliability Engineer ( U.S Citizenship required )
Role details
Job location
Tech stack
Job description
Palo Alto Networks runs a large infrastructure and is one of the largest GCP customers.
As a Principal Site Reliability Engineer for the ADEM (Autonomous Digital Experience Management) team, you will be part of a team supporting the services that provide end-to-end visibility and self-healing capabilities for our global customers. This includes automation, architecture, performance, observability, troubleshooting, security, and reliability.
Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI, ArgoCD, Prometheus, Grafana, Loki, Docker, GCP, AWS, Vault, Kafka, MySQL, Python, Bash, and Go.
Your Impact
- Drive the success of SRE and DevOps through expert contributions in CI/CD and AIOps initiatives, moving the organization toward self-healing infrastructure.
- Architect "Golden Paths" for service delivery, ensuring that SLOs, error budgets, and automated canary analysis are integrated by default.
- Design, build, and operate reliable, secure Cloud infrastructure that supports high-scale synthetic monitoring and Real User Monitoring (RUM).
- Ensure applications are production-ready, scalable, and resilient, collaborating closely with developers, researchers, and data scientists.
- Develop tools and automation frameworks that champion Infrastructure as Code (IaC) and Monitoring as Code (MaC).
- Lead root cause analysis (RCA) of critical business and production issues, driving improvements that prevent recurrence.
Requirements
Do you have experience in Tooling?, Do you have a Bachelor's degree?, * Must be a U.S Citizen due to Federal Government requirement for the role at issue.
- 7+ years as an engineer in Infrastructure, Operations, DevOps, or System Engineering.
- The candidate must be familiar with and demonstrate proficiency in using code assist and AI productivity tools such as Claude code, Cursor, Windsurf, or GitHub Copilot to accelerate development and troubleshooting.
- Expertise in building high-availability, scalable cloud-native applications on GCP (preferred) or AWS.
- Expertise in configuration management and IaC (Terraform, Helm, Ansible).
- Strong proficiency in programming languages like Python, Go, or Java; experience with data streaming frameworks like Kafka or Apache Pulsar is a plus.
- Deep experience in Kubernetes (GKE/EKS), container networking, and Linux internals.
- Experience with GitOps principles and tools like GitLab CI and ArgoCD.
- Familiarity with compliance and security frameworks (FedRAMP, SOC2) and automating policy-as-code.
- Excellent communication skills, with a "rally support" mindset to collaborate across multi-functional teams.
- BS or MS in Computer Science, a related field, or equivalent professional/military experience.
The ADEM engineering team is at the core of our SASE (Secure Access Service Edge) offering. We are constantly innovating-challenging the way the industry thinks about digital experience and cybersecurity. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.
Benefits & conditions
The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/com-missioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.
$151,600.00 - $245,300.00/yr
Our Commitment
We're trailblazers that dream big, take risks, and challenge cybersecurity's status quo. It's simple: we can't accomplish our mission without diverse teams innovating, together.