Site Reliability Engineer NEX

Seventy Seven Energy LLC

Houston, United States of America

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Houston, United States of America

Tech stack

Java

Airflow

Amazon Web Services (AWS)

Google BigQuery

Cloud Storage

Continuous Integration

DevOps

Distributed Systems

Data Flow Control

Github

Identity and Access Management

Python

Reliability Engineering

Prometheus

Software Engineering

Data Logging

Google Cloud Platform

Cloud Platform System

Cloud Monitoring

System Availability

Grafana

Reliability of Systems

Containerization

Gitlab-ci

Kubernetes

Information Technology

Terraform

Docker

Job description

The NexTier Technology team is looking for a Site Reliability Engineer (SRE) to help build, scale, and maintain highly reliable systems on Google Cloud Platform (GCP). This role blends software engineering with infrastructure expertise to ensure our services are performant, resilient, and cost-efficient. Candidates will work closely with engineering teams to improve system reliability, automate operations, and embed best practices across the platform.

Detailed Description:

Design, implement, and manage scalable, reliable infrastructure on GCP
Maintain and improve system availability, performance, and latency
Build and manage infrastructure as code using tools like Terraform or similar
Develop automation to reduce manual operational work
Monitor systems using observability tools (metrics, logs, tracing) and respond to incidents
Participate in on-call rotations and lead incident response and postmortems
Collaborate with development teams to improve service reliability and deployment processes
Optimize cloud resource usage and cost efficiency
Implement and maintain CI/CD pipelines for reliable software delivery
Define and track SLIs, SLOs, and error budgets

Requirements

3+ years of experience in Site Reliability Engineering, DevOps, or similar roles
Strong experience with Google Cloud Platform (GCP) services (e.g., Compute Engine, GKE, Cloud Run, Cloud Storage)
Experience with containerization and orchestration (Docker, Kubernetes)
Proficiency in at least one programming language (e.g., Python, Go, Java)
Experience with Infrastructure as Code (Terraform preferred)
Solid understanding of networking, security, and distributed systems
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Cloud Monitoring)
Familiarity with CI/CD pipelines and automation tools
Hands-on experience with Google Cloud Platform, including:
GCP: GKE, Compute Engine, Cloud Storage, Pub/Sub (or equivalents)
Cloud Monitoring & Logging
BigQuery
Dataflow
Datastream
IAM and networking
Composer/AIrflow
Kubernetes: deployment, scaling, reliability patterns
CI/CD: GitHub Actions, GitLab CI, or similar
Observability: GCP Cloud Monitoring, Logging
Experience operating systems in 24/7 production environments, * Bachelor's degree in Business, Information Technology, Computer Science, or a related field.
3+ years experience in Site Reliability Engineering, Cloud Platform Engineering, or DevOps
3+ years operating production workloads on Google Cloud Platform (GCP)
Ability to understand and speak English at a level of proficiency allowing employee to issue, receive and respond to both safety and operations-related directions in English

Preferred Qualifications:

Oil and Gas Industry knowledge
Technology/Digital Industry knowledge

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all