Principal Site Reliability Engineer (CIPE)

Palo Alto Networks
Palo Alto, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Palo Alto, United States of America

Tech stack

Artificial Intelligence
Computer Programming
Continuous Integration
Cursor (Graphical User Interface Elements)
Software Debugging
Github
Python
Pair Programming
Performance Tuning
Role-Based Access Control
Software Systems
System Programming
Systems Integration
System Availability
Delivery Pipeline
Gitlab-ci
Kubernetes
Prisma Cloud Platform
Jenkins
Vulnerability Analysis

Job description

As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You aren't just managing servers; you are architecting the reliability, scalability, and security of a massive Kubernetes ecosystem. We are looking for a visionary who balances deep systems expertise with a modern, AI-augmented development workflow. You will lead the evolution of our GKE (Google Kubernetes Engine) environment, championing GitOps best practices and integrating advanced security protocols directly into our delivery pipelines.

Your Impact

Infrastructure Leadership: Architect and oversee large-scale Kubernetes clusters in GKE, ensuring high availability, performance tuning, and cost optimization.

GitOps & Orchestration: Design and refine complex CI/CD lifecycles using ArgoCD, moving toward a fully declarative infrastructure-as-code model.

Security Engineering: Implement and manage security scanning tools (e.g., Prisma Cloud, Snyk, or GKE native security) to ensure container integrity and shift-left security compliance.

Automation & Tooling: Develop sophisticated automation scripts and internal tools using Python to eliminate manual toil and improve system observability.

AI-Driven Development: Lean into the future of engineering by utilizing Cursor and Claude to accelerate coding, debugging, and documentation tasks.

Incident Management: Act as a final escalation point for complex infrastructure outages, conducting blameless post-mortems to drive systemic improvements.

Requirements

Your Experience

Kubernetes Mastery: Expert-level experience managing production K8s workloads (preferably within GKE, but will also consider EKS).

Deep understanding of Networking, Storage, and RBAC.

CI/CD & GitOps: Hands-on expertise with ArgoCD and modern pipeline runners (GitHub Actions, GitLab CI, or Jenkins).

Programming: Proficient in Python for systems programming and automation.

Security Mindset: Proven experience integrating security scanning and compliance checks within a containerized environment.

Modern Workflow: Experience (or strong desire) using AI-pair programming tools like Cursor and Claude to multiply personal and team productivity.

Benefits & conditions

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/com-missioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.

  • /yr

Our Commitment

We're trailblazers that dream big, take risks, and challenge cybersecurity's status quo. It's simple: we can't accomplish our mission without diverse teams innovating, together.

Apply for this position