Principal Site Reliability Engineer (CIPE)

Palo Alto Networks

Palo Alto, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Palo Alto, United States of America

Tech stack

Artificial Intelligence

Computer Programming

Continuous Integration

Cursor (Graphical User Interface Elements)

Software Debugging

Github

Python

Pair Programming

Performance Tuning

Role-Based Access Control

Software Systems

System Programming

Systems Integration

System Availability

Delivery Pipeline

Gitlab-ci

Kubernetes

Prisma Cloud Platform

Jenkins

Vulnerability Analysis

Job description

As a Principal Site Reliability Engineer, you will serve as the technical authority for our cloud-native infrastructure. You aren't just managing servers; you are architecting the reliability, scalability, and security of a massive Kubernetes ecosystem. We are looking for a visionary who balances deep systems expertise with a modern, AI-augmented development workflow. You will lead the evolution of our GKE (Google Kubernetes Engine) environment, championing GitOps best practices and integrating advanced security protocols directly into our delivery pipelines.

Your Impact

Infrastructure Leadership: Architect and oversee large-scale Kubernetes clusters in GKE, ensuring high availability, performance tuning, and cost optimization.

GitOps & Orchestration: Design and refine complex CI/CD lifecycles using ArgoCD, moving toward a fully declarative infrastructure-as-code model.

Security Engineering: Implement and manage security scanning tools (e.g., Prisma Cloud, Snyk, or GKE native security) to ensure container integrity and shift-left security compliance.

Automation & Tooling: Develop sophisticated automation scripts and internal tools using Python to eliminate manual toil and improve system observability.

AI-Driven Development: Lean into the future of engineering by utilizing Cursor and Claude to accelerate coding, debugging, and documentation tasks.

Incident Management: Act as a final escalation point for complex infrastructure outages, conducting blameless post-mortems to drive systemic improvements.

Requirements

Your Experience

Kubernetes Mastery: Expert-level experience managing production K8s workloads (preferably within GKE, but will also consider EKS).

Deep understanding of Networking, Storage, and RBAC.

CI/CD & GitOps: Hands-on expertise with ArgoCD and modern pipeline runners (GitHub Actions, GitLab CI, or Jenkins).

Programming: Proficient in Python for systems programming and automation.

Security Mindset: Proven experience integrating security scanning and compliance checks within a containerized environment.

Modern Workflow: Experience (or strong desire) using AI-pair programming tools like Cursor and Claude to multiply personal and team productivity.

Benefits & conditions

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/com-missioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here.

Our Commitment

We're trailblazers that dream big, take risks, and challenge cybersecurity's status quo. It's simple: we can't accomplish our mission without diverse teams innovating, together.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all