Sr Site Reliability Engineer (Advanced Threat Protection)

Palo Alto Networks
Santa Clara, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 195K

Job location

Santa Clara, United States of America

Tech stack

Adobe InDesign
Artificial Intelligence
Amazon Web Services (AWS)
Application Services
Automation of Tests
Google BigQuery
Cloud Computing
Databases
Continuous Integration
Linux
DevOps
Distributed Systems
Web Servers
Identity and Access Management
Python
PostgreSQL
Log Analysis
MySQL
Redis
Reliability Engineering
Ansible
Shell Script
Application Enhancement Tool
Large Language Models
Multi-Cloud
Gitlab
Information Technology
Terraform
Oracle Cloud Infrastructure
Artifactory
Microservices

Job description

  • Design, build, and operate cloud infrastructure that enables reliable, rapid deployment of microservices with resilient operations and effective monitoring
  • Leverage AI/ML to automate incident detection, root cause analysis, and remediation - reducing toil and accelerating mean time to resolution
  • Build and integrate AI-powered tools (e.g., LLM-based agents, AIOps platforms) into SRE workflows for intelligent alerting, log analysis, and capacity planning
  • Write automation code for provisioning and operating infrastructure at massive scale
  • Develop self-healing systems that can automatically detect anomalies, diagnose issues, and take corrective action with minimal human intervention
  • Work with development teams to ensure applications are production-ready, scalable, and reliable from the ground up
  • Identify and drive opportunities to improve automation for code deployment, management, and observability of application services
  • Establish end-to-end monitoring and alerting on all critical components, incorporating AI-driven anomaly detection and predictive analytics
  • Participate in the on-call rotation supporting the platform and production applications
  • Lead root cause analysis of critical business and production issues, building runbooks and automation to prevent recurrence
  • Mentor other SREs on best practices in infrastructure orchestration, production troubleshooting, and AI-augmented operations
  • Represent SRE in design reviews and work cross-functionally with engineering teams on operational readiness

Requirements

  • 5+ years of experience in DevOps, Site Reliability, or infrastructure engineering
  • Expertise in multi-cloud environments - strong hands-on experience with GCP, AWS, and familiarity with OCI (Oracle Cloud Infrastructure)
  • Experience designing and operating infrastructure across multiple cloud providers, including networking, identity management, and cross-cloud connectivity
  • Expertise in Infrastructure as Code with tools such as Terraform, Ansible
  • Strong proficiency in Python and shell scripting for automation
  • Strong experience with Linux and distributed systems handling high-volume transactions
  • Familiarity with CI/CD pipelines, GitLab, and Artifactory
  • Strong fundamentals in HTTP, web servers, and networking
  • BS or MS in Computer Science, a related field, or equivalent professional experience
  • Excellent problem solving, critical thinking, communication, and teamwork skills
  • Self-disciplined, self-managed, self-motivated with a strong sense of ownership, urgency, and drive
  • Experience applying AI/ML to operational workflows (e.g., AIOps, intelligent alerting, automated remediation, or LLM-powered tooling) is a strong plus
  • Experience with cloud compliance frameworks (FedRAMP, IL5) and operating in regulated environments is a plus
  • Experience building and managing large database systems - relational (MySQL, PostgreSQL) and non-relational (Redis, BigQuery, etc.) - is a plus

Benefits & conditions

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for non-sales roles) or base salary + commission target (for sales/com-missioned roles) is expected to be the annual range listed below. The offered compensation may also include restricted stock units and a bonus. A description of our employee benefits may be found here (https://benefits.paloaltonetworks.com/) .

$120,300.00 - $194,525.00/yr

Our Commitment

We're trailblazers that dream big, take risks, and challenge cybersecurity's status quo. It's simple: we can't accomplish our mission without diverse teams innovating, together.

Apply for this position