Site Reliability Engineer (SRE)

THE JUDGE GROUP, INC.

Charlotte, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Experience level

Senior

Compensation

$ 152K

Job location

Charlotte, United States of America

Tech stack

Microsoft Windows

Microsoft Active Directory

Bash

Cloud Computing

Cloud Computing Security

CompTIA Security+

Continuous Integration

Dynamic Host Configuration Protocol

Linux

Distributed Systems

DNS

Identity and Access Management

Python

Powershell

Reliability Engineering

Ansible

Prometheus

Software Engineering

Software Vulnerability Management

SSL Certificate Management

Scripting (Bash/Python/Go/Ruby)

Google Cloud Platform

Load Balancing

Cloud Monitoring

Grafana

Reliability of Systems

Containerization

Gitlab-ci

Kubernetes

Infrastructure Automation Frameworks

Google Cloud Functions

Windows Security

Build Tools

Terraform

Docker

Jenkins

ServiceNow

Job description

We are looking for a Senior Site Reliability Engineer (SRE) to help scale and modernize platform operations across Windows, Linux, and cloud-native environments. In this role, you will drive the transition from application-specific support to platform-wide reliability engineering, focusing on automation, scalability, and resilience.

You will leverage your expertise in Google Cloud Platform (Google Cloud Platform), container orchestration, and infrastructure automation to build systems that are reliable, secure, and performant across a diverse enterprise landscape. What You'll Do Reliability & Cloud Infrastructure

Design, build, and maintain highly available, scalable systems across Windows, Linux, and Google Cloud Platform environments
Operate and support containerized applications using Kubernetes (GKE) and Docker
Provision and manage infrastructure using Terraform, Ansible, and Google Cloud Platform-native tools

Automation & Observability

Develop tools and automation to reduce manual effort and improve system reliability
Define and implement SLIs/SLOs to drive service performance and reliability
Build monitoring and alerting solutions using Prometheus, Grafana, and Google Cloud Platform Operations Suite

Incident Response & Resilience

Lead incident management, root cause analysis, and postmortems
Design and implement self-healing systems and automated remediation workflows
Improve system resilience through proactive reliability engineering practices

Security & Compliance

Partner with security teams to enforce infrastructure hardening and vulnerability management
Integrate security controls into CI/CD pipelines and container platforms
Implement IAM, encryption, and policy enforcement across cloud environments

Collaboration & Enablement

Work cross-functionally with developers, infrastructure teams, and stakeholders
Create documentation, runbooks, and operational best practices
Enable teams to adopt reliable, scalable platform solutions

Requirements

3+ years of experience in Windows or Linux production support/administration
5+ years of software engineering experience or equivalent combination of work, training, or education
Experience with cloud platforms (Google Cloud Platform preferred) and distributed systems

Preferred Qualifications

Strong scripting skills (e.g., Python, PowerShell, Shell)
Hands-on experience with Google Cloud Platform services (GKE, IAM, Cloud Functions, Cloud Monitoring)
Expertise in Docker and Kubernetes
Experience with Infrastructure as Code (Terraform, Ansible)
Knowledge of Active Directory, DNS, DHCP, and Windows security
Experience with CI/CD tools (GitLab CI, Jenkins)
Familiarity with ITIL practices and change management processes
Exposure to ServiceNow, load balancing, certificate management, endpoint security tools
Security certifications (e.g., CISSP, Security+, Google Cloud Platform Professional Cloud Security Engineer)
Experience working in financial services or regulated industries

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all