Site Reliability Engineer (SRE)

THE JUDGE GROUP, INC.
Charlotte, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior
Compensation
$ 152K

Job location

Charlotte, United States of America

Tech stack

Microsoft Windows
Microsoft Active Directory
Bash
Cloud Computing
Cloud Computing Security
CompTIA Security+
Continuous Integration
Dynamic Host Configuration Protocol
Linux
Distributed Systems
DNS
Identity and Access Management
Python
Powershell
Reliability Engineering
Ansible
Prometheus
Software Engineering
Software Vulnerability Management
SSL Certificate Management
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Load Balancing
Cloud Monitoring
Grafana
Reliability of Systems
Containerization
Gitlab-ci
Kubernetes
Infrastructure Automation Frameworks
Google Cloud Functions
Windows Security
Build Tools
Terraform
Docker
Jenkins
ServiceNow

Job description

We are looking for a Senior Site Reliability Engineer (SRE) to help scale and modernize platform operations across Windows, Linux, and cloud-native environments. In this role, you will drive the transition from application-specific support to platform-wide reliability engineering, focusing on automation, scalability, and resilience.

You will leverage your expertise in Google Cloud Platform (Google Cloud Platform), container orchestration, and infrastructure automation to build systems that are reliable, secure, and performant across a diverse enterprise landscape. What You'll Do Reliability & Cloud Infrastructure

  • Design, build, and maintain highly available, scalable systems across Windows, Linux, and Google Cloud Platform environments
  • Operate and support containerized applications using Kubernetes (GKE) and Docker
  • Provision and manage infrastructure using Terraform, Ansible, and Google Cloud Platform-native tools

Automation & Observability

  • Develop tools and automation to reduce manual effort and improve system reliability
  • Define and implement SLIs/SLOs to drive service performance and reliability
  • Build monitoring and alerting solutions using Prometheus, Grafana, and Google Cloud Platform Operations Suite

Incident Response & Resilience

  • Lead incident management, root cause analysis, and postmortems
  • Design and implement self-healing systems and automated remediation workflows
  • Improve system resilience through proactive reliability engineering practices

Security & Compliance

  • Partner with security teams to enforce infrastructure hardening and vulnerability management
  • Integrate security controls into CI/CD pipelines and container platforms
  • Implement IAM, encryption, and policy enforcement across cloud environments

Collaboration & Enablement

  • Work cross-functionally with developers, infrastructure teams, and stakeholders
  • Create documentation, runbooks, and operational best practices
  • Enable teams to adopt reliable, scalable platform solutions

Requirements

  • 3+ years of experience in Windows or Linux production support/administration
  • 5+ years of software engineering experience or equivalent combination of work, training, or education
  • Experience with cloud platforms (Google Cloud Platform preferred) and distributed systems

Preferred Qualifications

  • Strong scripting skills (e.g., Python, PowerShell, Shell)
  • Hands-on experience with Google Cloud Platform services (GKE, IAM, Cloud Functions, Cloud Monitoring)
  • Expertise in Docker and Kubernetes
  • Experience with Infrastructure as Code (Terraform, Ansible)
  • Knowledge of Active Directory, DNS, DHCP, and Windows security
  • Experience with CI/CD tools (GitLab CI, Jenkins)
  • Familiarity with ITIL practices and change management processes
  • Exposure to ServiceNow, load balancing, certificate management, endpoint security tools
  • Security certifications (e.g., CISSP, Security+, Google Cloud Platform Professional Cloud Security Engineer)
  • Experience working in financial services or regulated industries

Apply for this position