Site Reliability Engineer - SRE

Apex Systems LLC

Austin, United States of America

29 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Austin, United States of America

Tech stack

JavaScript

.NET

API

Agile Methodologies

Application Lifecycle Management

Application Performance Management

Azure

Bash

C Sharp (Programming Language)

Software as a Service

Cloud Computing

Computer Networks

Databases

Continuous Integration

Query Languages

DevOps

DNS

Github

Monitoring of Systems

Hypertext Transfer Protocols (HTTP)

Infrastructure as a Service (IaaS)

Python

Load Testing

Log Analysis

SQL Azure

NoSQL

Platform as a Service (PAAS)

Performance Tuning

Powershell

Reliability Engineering

Azure DevOps Pipelines

Kusto Query Language

Runbook

Software Engineering

SQL Databases

TCP/IP

TypeScript

Scripting (Bash/Python/Go/Ruby)

Transport Layer Security

Load Balancing

Cloud Monitoring

React

Firewalls (Computer Science)

Containerization

Kubernetes

Infrastructure Automation Frameworks

Information Technology

Bicep

TFS

Cosmos DB

Terraform

Docker

Microservices

Job description

Our organization is seeking a motivated Site Reliability Engineer (SRE) to join our dynamic Advisor Platform Engineering team. This role focuses on safeguarding the availability, performance, and scalability of our mission-critical, Azure-hosted platform. As an individual contributor, you will apply your expertise in cloud infrastructure, automation, and observability to maintain and enhance our systems, collaborating closely with Agile development teams to embed reliability principles throughout the application lifecycle., * Monitor, maintain, and optimize Azure infrastructure, ensuring the health, performance, and availability of IaaS, PaaS, and SaaS components.

Enhance observability by defining, measuring, and refining Service Level Indicators (SLIs) and Objectives (SLOs) using Azure Monitor, Application Insights, and Log Analytics (KQL).
Develop automation and tooling using scripting languages (PowerShell, Bash, Python) and potentially C#/.NET to eliminate manual tasks and improve efficiency.
Participate in a 24/7 on-call rotation, contributing to incident triage, mitigation, root cause analysis (RCA), and the implementation of preventive actions.
Collaborate with software development, QA, and other technology teams to ensure reliability, scalability, and performance requirements are met.
Contribute to capacity planning, load testing, and performance tuning initiatives across a .NET and React micro-service architecture.
Create and maintain clear documentation for systems, processes, and runbooks.
Troubleshoot and support integrations with third-party systems via APIs and SSO implementations., This is a hybrid position requiring on-site work in South Austin, TX, from Monday to Thursday, with an option for remote work on Fridays. The role includes participation in a 24/7 on-call rotation to support critical production releases and incidents outside of standard business hours.

Requirements

Education: A Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience is required.

Experience: This position requires 2-5 years of experience in Site Reliability Engineering, DevOps, Cloud Operations/Engineering, Systems Administration, or Software Engineering with a strong operational focus. Hands-on experience with platform development, automation strategies, and monitoring strategies within Azure is necessary. Familiarity with SQL and the .NET stack is also required for effective application troubleshooting.

Technical Skills:

Hands-on experience managing and troubleshooting production workloads on Microsoft Azure (IaaS & PaaS).
Proficiency with Azure monitoring tools (Azure Monitor, Application Insights, Log Analytics) and KQL query language.
Solid scripting skills for automation (e.g., PowerShell, Bash, Python).
Experience with CI/CD concepts and tools, particularly Azure DevOps pipelines.
Proficiency with Git workflows and platforms like Azure Repos or GitHub.
Solid understanding of networking concepts (TCP/IP, DNS, HTTP/HTTPS, TLS, Load Balancing, Firewalls).

Preferred Qualifications

A basic understanding or development experience with C#/.NET applications.
Experience with Infrastructure as Code (IaC) tools like ARM templates, Bicep, or Terraform.
Familiarity with containerization technologies (Docker) and orchestration (Kubernetes, Azure Container Apps).
Experience supporting relational (e.g., Azure SQL) and NoSQL (e.g., Cosmos DB) databases.
Basic familiarity with modern JavaScript front-end technologies like React/Typescript.
Experience working in Agile development environments or within the financial services industry.
Microsoft Certified: Azure Administrator Associate (AZ-104) or DevOps Engineer Expert (AZ-400).

About the company

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Apex uses a virtual recruiter as part of the application process. Click for more details.