Site Reliability Engineer Lead

ESG

Houston, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Houston, United States of America

Tech stack

Airflow

Amazon Web Services (AWS)

Google BigQuery

Cloud Storage

Continuous Integration

Information Engineering

Data Systems

DevOps

Data Flow Control

Github

Identity and Access Management

Python

Operational Data Store

Reliability Engineering

Site Reliability Engineering Practices

Cloud Services

Data Streaming

Data Logging

Google Cloud Platform

Cloud Platform System

Cloud Monitoring

Gitlab-ci

Kubernetes

Information Technology

Data Management

Terraform

Job description

Lead SRE practices for Google Cloud Platform-based data platforms
Design and own SLIs, SLOs, error budgets, and reliability metrics
Build and maintain cloud-native observability (monitoring, logging, alerting)
Lead incident response for production cloud systems and drive postmortems
Partner with data engineering and platform teams to design reliable architectures
Automate operational workflows using Python
Drive improvements in CI/CD, infrastructure as code, and deployment safety
Mentor engineers and set SRE best practices across the team

Requirements

We are seeking an Site Reliability Engineer Lead to own and evolve the reliability, scalability, and operational excellence of cloud-native data platforms running primarily on Google Cloud Platform (Google Cloud Platform). This role supports data systems that ingest, process, and serve large volumes of operational data from oilfield and energy environments. The ideal candidate is a cloud-first SRE with deep Google Cloud Platform experience, strong Python engineering skills, and a track record of leading reliability initiatives for data-intensive systems., * 7+ years in SRE, Cloud Platform Engineering, or DevOps

Strong hands-on experience with Google Cloud Platform, including:
Google Cloud Platform: GKE, Compute Engine, Cloud Storage, Pub/Sub (or equivalents)
Cloud Monitoring & Logging
BigQuery
Dataflow
Datastream
IAM and networking
Composer/AIrflow
Kubernetes: deployment, scaling, reliability patterns
CI/CD: GitHub Actions, GitLab CI, or similar
Observability: Google Cloud Platform Cloud Monitoring, Logging
Experience supporting cloud-native data systems (batch and streaming)
Production experience with Python for automation, tooling, or services
Infrastructure as Code experience (Terraform strongly preferred)
Experience operating systems in 24/7 production environments, * Bachelor's degree in Business, Information Technology, Computer Science, or a related field.
5+ years experience in Site Reliability Engineering, Cloud Platform Engineering, or DevOps
3+ years operating production workloads on Google Cloud Platform (Google Cloud Platform)
Prior technical leadership experience (lead engineer, tech lead, or ownership of reliability initiatives)
Ability to understand and speak English at a level of proficiency allowing employee to issue, receive and respond to both safety and operations-related directions in English

Preferred Qualifications:

Oil and Gas Industry knowledge
Technology/Digital Industry knowledge

About the company

The Evolving Oil Field Demands Evolving Service Providers NexTier is a leading provider of integrated completions that employs sustainable practices and equipment to support our customers' ESG goals while accelerating production in the most demanding US land basins. Patterson-UTI is committed to a workplace free from discrimination and harassment, offering equal employment opportunities to all individuals regardless of personal characteristics protected by law. Employees are encouraged to report any concerns through multiple channels.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all