Site Reliability Engineer
TSR, Inc
Alpharetta, United States of America
yesterday
Role details
Contract type
Temporary contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Alpharetta, United States of America
Tech stack
Java
API
Amazon Web Services (AWS)
Systems Engineering
Azure
Bash
Cloud Computing
Computer Programming
Databases
DevOps
Distributed Systems
Python
PostgreSQL
Enterprise Messaging Systems
Microsoft SQL Server
MongoDB
Powershell
Reliability Engineering
Software Engineering
Data Logging
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Event Driven Architecture
Infrastructure Automation Frameworks
Non-relational Database
REST
Terraform
Requirements
- 7 to 10+ years of experience in site reliability engineering, systems engineering, software engineering, DevOps, infrastructure engineering, or production operations
- Strong experience supporting highly available, distributed, cloud-based, or mission-critical technology platforms
- Hands-on experience with observability practices, including monitoring, alerting, logging, metrics, tracing, dashboards, and service health reporting
- Experience instrumenting applications, services, APIs, infrastructure, databases, and cloud components to enable end-to-end operational visibility
- Strong understanding of reliability engineering concepts, including SLIs, SLOs, SLAs, error budgets, incident management, capacity management, and operational readiness
- Experience designing actionable alerts that support rapid issue detection, triage, escalation, and resolution
- Experience building and maintaining operational dashboards for technical teams, support teams, and senior/executive stakeholders
- Strong scripting or programming skills using Python, Java, Bash, PowerShell, or similar languages for automation and operational tooling
- Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform
- Experience with Infrastructure-as-Code tools such as Terraform or similar technologies
- Experience working with pipelines, DevOps workflows, release processes, and production support models
- Experience troubleshooting distributed systems, REST services, event-driven architectures, messaging platforms, and service-to-service integrations
- Familiarity with relational and non-relational databases, such as PostgreSQL, MSSQL, MongoDB, or similar platforms
- Strong analytical, troubleshooting, and problem-solving skills with the ability to diagnose complex technical issues across multiple layers of the stack
- Strong written and verbal communication skills, including the ability to translate technical issues into clear business and executive-level updates