Lead Site Reliability Engineer

Zoom Video Communications, Inc.

San Jose, United States of America

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Experience level

Senior

Compensation

$ 304K

Job location

Remote

San Jose, United States of America

Tech stack

Data analysis

Border Gateway Protocol

Configuration Management

Data Centers

Software Design Patterns

Linux

File Systems

Distributed Data Store

DNS

Fault Tolerance

Python

Key Management

Linux System Administration

Performance Tuning

Reliability Engineering

Cloud Services

Ansible

Ceph

Transport Layer Security

Load Balancing

Cloud Platform System

Connectivity Problems

Gitlab

Kubernetes

U-Boot

Terraform

Docker

Jenkins

Job description

As a Senior Lead Site Reliability Engineer, you can anticipate opportunities to work on our hybrid systems across the globe. You will be responsible for installing, configuring, and monitoring new systems within a network of global data centers. Additionally, you will patch and maintain thousands of physical and cloud systems worldwide. To streamline operations, you will develop automation to reduce repetitive tasks and analyze and address performance bottlenecks. Furthermore, you will update and troubleshoot user access permissions, resolve network connectivity issues, and maintain system firewalls.

About the Team

Zoom's SRE team is committed to delivering customer happiness, improving business efficiency, and promoting agility through innovation, data-driven insights, and automation. Our impact is reflected in smooth user experiences, optimized processes, and support for Zoom's expansion in the realm of communication and collaboration.

Responsibilities

Providing technical direction for cross-team initiatives and major incidents. Mentor SRE's and developers; define best practices and design patterns. Partner with Security, Networking, and Platform teams on architecture roadmaps.Influence vendor and hardware strategy for on-prem and cloud workloads. Design self-healing platforms using automation, chaos engineering, and fault-tolerant patterns. Optimize Linux systems at scale: performance tuning, kernel parameters, networking, storage, and security hardening. Define best practices and advocate for them across the company. Excellent communication skills and experience driving cross team projects as a technical lead. Able to participate in on-call shifts and incident management and work after hours/weekends for application releases/deployments.

Requirements

10+ years in SRE, production engineering, or large-scale systems administration
Have experience of Linux system administration (systemd, cgroups, networking, filesystems, performance analysis)
Demonstrate coding ability with at least one programming language e.g. Python
Have experience with configuration management (Ansible), IaC (Terraform, Packer), CI/CD pipelines (Jenkins, GitLab), container orchestration (k8s, Docker) and observability platforms.
Have experience with incident response for mission-critical environments.
Possess a security -first mindset (TPM, secure boot, identity, secrets management).
Demonstrate networking expertise: BGP, load balancing, DNS, TLS, traffic engineering.
Have experience with chaos engineering and resilience testing. Have experience with distributed storage systems such as Ceph
Occasional weekend work may be required
Ability to work across the globe or multiple time zones

Benefits & conditions

$146 700,00

Maximum: $339 300,00

In addition to the base salary and/or OTE listed Zoom has a Total Direct Compensation philosophy that takes into consideration; base salary, bonus and equity value.

Note: Starting pay will be based on a number of factors and commensurate with qualifications & experience.

We also have a location based compensation structure; there may be a different range for candidates in this and other locations

At Zoom, we offer a window of at least 5 days for you to apply because we believe in giving you every opportunity. Below is the potential closing date, just in case you want to mark it on your calendar. We look forward to receiving your application!

Anticipated Position Close Date, As part of our award-winning workplace culture and commitment to delivering happiness, our benefits program offers a variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health; support work-life balance; and contribute to their community in meaningful ways. Click Learn for more information.

About the company

Zoomies help people stay connected so they can get more done together. We set out to build the best collaboration platform for the enterprise, and today help people communicate better with products like Zoom Contact Center, Zoom Phone, Zoom Events, Zoom Apps, Zoom Rooms, and Zoom Webinars. We're problem-solvers, working at a fast pace to design solutions with our customers and users in mind. Find room to grow with opportunities to stretch your skills and advance your career in a collaborative, growth-focused environment. Our Commitment At Zoom, we believe great work happens when people feel supported and empowered. We're committed to fair hiring practices that ensure every candidate is evaluated based on skills, experience, and potential. If you require an accommodation during the hiring process, let us know-we're here to support you at every step.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all