Software Engineer 4 - Site Reliability Engineer (SRE)
Role details
Job location
Tech stack
Job description
In this contingent resource assignment, you will consult on complex initiatives with broad impact and large-scale planning for Software Engineering. You will help transform platform operations into a modern Site Reliability Engineering model, focusing on building scalable, resilient cloud-native infrastructure. This role emphasizes automation, observability, and reliability across hybrid environments, ensuring secure and high-performing systems aligned with enterprise standards. Day-to-Day Responsibilities:
- Ensure availability, performance, and security across Windows, Linux, and Google Cloud Platform environments
- Engineer and support containerized workloads using Kubernetes (GKE) and Docker
- Develop infrastructure as code using Terraform, Ansible, and Google Cloud Platform-native tools
- Build automation scripts and pipelines to reduce manual effort and improve efficiency
- Implement monitoring and observability using SLIs/SLOs, Prometheus, Grafana, and Google Cloud Platform tools
- Lead incident response, root cause analysis, and postmortem activities
- Design and implement self-healing and automated remediation solutions
- Collaborate with InfoSec to enforce security controls and compliance requirements
- Partner with development and infrastructure teams to enable reliable platform services
- Document runbooks, configurations, and operational procedures
Requirements
-
5+ years of software/platform engineering or SRE experience
-
3+ years of Windows and/or Linux administration in production environments
-
Experience with Google Cloud Platform, Kubernetes (GKE), and infrastructure automation (Terraform/Ansible)Plusses:
-
Scripting experience (PowerShell, Python, Bash)
-
Experience with observability tools (Prometheus, Grafana, Google Cloud Platform Ops Suite)
-
Experience in regulated environments (Financial Services)