Site Reliability Engineer in Austin

Energy Jobline
Austin, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Austin, United States of America

Tech stack

Java
Amazon Web Services (AWS)
Systems Engineering
Bash
DevOps
Distributed Systems
Monitoring of Systems
Python
Reliability Engineering
Prometheus
Datadog
Data Logging
System Availability
Grafana
Reliability of Systems
Kubernetes
Deployment Automation
Terraform
Splunk
Docker
Go

Job description

We are seeking a highly experienced Systems Analyst 3 (SRE / DevOps Engineer) to support critical production systems for a large government agency. This role focuses on Site Reliability Engineering (SRE) practices to ensure system reliability, scalability, performance, and availability.

You will collaborate with engineering teams to build resilient, automated, and observable cloud- platforms., * Design, build, and maintain highly available distributed systems

  • Manage and scale Kubernetes (EKS/GKE) and containerized environments
  • Implement and manage monitoring, logging, and observability tools
  • Define and track SLIs, SLOs, and error budgets
  • Lead incident management, root cause analysis (RCA), and postmortems
  • Develop automation scripts using Python, Go, Java, or Bash
  • Build and maintain CI/CD pipelines and Infrastructure as Code (Terraform)
  • Collaborate on deployment strategies (blue-green, canary releases)
  • Ensure security, compliance, and operational excellence

Requirements

  • 8+ years of experience in SRE / DevOps / Systems Engineering
  • Strong expertise in Linux/Unix systems
  • Hands-on experience with AWS or GCP cloud platforms
  • Deep experience with Kubernetes and Docker
  • Strong understanding of distributed systems and high availability architecture
  • Experience with monitoring & observability tools (Prometheus, Grafana, Datadog, Splunk)
  • Experience with incident management, RCA, and production support
  • Proficiency in Python, Go, Java, or Bash scripting, * Experience with Chaos Engineering / resiliency testing
  • Knowledge of feature flags, canary deployments, progressive delivery
  • Experience supporting 24x7 production environments / on-call rotations
  • Strong documentation and runbook creation skills

About the company

Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide. We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.

Apply for this position