Site Reliability Engineer in Austin
Energy Jobline
Austin, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Austin, United States of America
Tech stack
Java
Amazon Web Services (AWS)
Systems Engineering
Bash
DevOps
Distributed Systems
Monitoring of Systems
Python
Reliability Engineering
Prometheus
Datadog
Data Logging
System Availability
Grafana
Reliability of Systems
Kubernetes
Deployment Automation
Terraform
Splunk
Docker
Go
Job description
We are seeking a highly experienced Systems Analyst 3 (SRE / DevOps Engineer) to support critical production systems for a large government agency. This role focuses on Site Reliability Engineering (SRE) practices to ensure system reliability, scalability, performance, and availability.
You will collaborate with engineering teams to build resilient, automated, and observable cloud- platforms., * Design, build, and maintain highly available distributed systems
- Manage and scale Kubernetes (EKS/GKE) and containerized environments
- Implement and manage monitoring, logging, and observability tools
- Define and track SLIs, SLOs, and error budgets
- Lead incident management, root cause analysis (RCA), and postmortems
- Develop automation scripts using Python, Go, Java, or Bash
- Build and maintain CI/CD pipelines and Infrastructure as Code (Terraform)
- Collaborate on deployment strategies (blue-green, canary releases)
- Ensure security, compliance, and operational excellence
Requirements
- 8+ years of experience in SRE / DevOps / Systems Engineering
- Strong expertise in Linux/Unix systems
- Hands-on experience with AWS or GCP cloud platforms
- Deep experience with Kubernetes and Docker
- Strong understanding of distributed systems and high availability architecture
- Experience with monitoring & observability tools (Prometheus, Grafana, Datadog, Splunk)
- Experience with incident management, RCA, and production support
- Proficiency in Python, Go, Java, or Bash scripting, * Experience with Chaos Engineering / resiliency testing
- Knowledge of feature flags, canary deployments, progressive delivery
- Experience supporting 24x7 production environments / on-call rotations
- Strong documentation and runbook creation skills
About the company
Energy Jobline is the largest and fastest growing global Energy Job Board and Energy Hub. We have an audience reach of over 7 million energy professionals, 400,000+ monthly advertised global energy and engineering jobs, and work with the leading energy companies worldwide.
We focus on the Oil & Gas, Renewables, Engineering, Power, and Nuclear markets as well as emerging technologies in EV, Battery, and Fusion. We are committed to ensuring that we offer the most exciting career opportunities from around the world for our jobseekers.