Site Reliability Engineer (SRE) Cloud Operations

Infinite Computer Solutions (ICS)

Alpharetta, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Alpharetta, United States of America

Tech stack

Microsoft Windows

Microsoft Active Directory

Amazon Web Services (AWS)

Azure

Banking Software

Cloud Computing

Continuous Integration

DevOps

Disaster Recovery

Event Logging

Github

IIS

Windows Server

Powershell

Reliability Engineering

Site Reliability Engineering Practices

Ansible

TCP/IP

SSL Certificate Management

Transport Layer Security

Load Balancing

Perfmon

Splunk

Dynatrace

Jenkins

Job description

We are seeking a skilled Site Reliability Engineer (SRE) to support and enhance mission-critical Digital Banking platforms running on Azure/AWS, Windows Server, and IIS environments. This role focuses on reliability engineering, cloud operations, production support, observability, and automation across enterprise-scale infrastructure., * Provide operational support for Digital Banking applications across Azure/AWS and Windows/IIS environments

Monitor and troubleshoot production systems using Dynatrace, Splunk, Windows Event Logs, and PerfMon
Lead and support P1/P2 incident response, root cause analysis (RCA), and service restoration
Manage IIS configurations, deployments, patching, SSL/TLS certificates, and production releases
Support high-availability, disaster recovery (DR), and load-balanced environments
Automate operational tasks using PowerShell, DSC, and Ansible
Collaborate with DevOps and Engineering teams to support CI/CD pipelines and improve platform reliability
Ensure adherence to enterprise security, compliance, and operational standards

Requirements

Windows Server (2016/2019/2022)
IIS Administration & Troubleshooting
Microsoft Azure and/or AWS
Dynatrace
Splunk
PowerShell Automation

Experience troubleshooting using Windows Event Logs and PerfMon

Strong understanding of TCP/IP, HTTP/S, TLS, Load Balancers, and Web Infrastructure

Experience supporting critical production incidents in enterprise environments

Knowledge of Active Directory, GPOs, service accounts, and certificate management

Experience with CI/CD tools such as Azure DevOps, GitHub Actions, or Jenkins

Preferred

Experience in Banking, Financial Services, or other regulated environments
Exposure to SRE practices, automation-first operations, and zero-downtime deployments

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all