Site Reliability Engineer - SRE

Apex Systems LLC
Austin, United States of America
29 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote
Austin, United States of America

Tech stack

JavaScript
.NET
API
Agile Methodologies
Application Lifecycle Management
Application Performance Management
Azure
Bash
C Sharp (Programming Language)
Software as a Service
Cloud Computing
Computer Networks
Databases
Continuous Integration
Query Languages
DevOps
DNS
Github
Monitoring of Systems
Hypertext Transfer Protocols (HTTP)
Infrastructure as a Service (IaaS)
Python
Load Testing
Log Analysis
SQL Azure
NoSQL
Platform as a Service (PAAS)
Performance Tuning
Powershell
Reliability Engineering
Azure DevOps Pipelines
Kusto Query Language
Runbook
Software Engineering
SQL Databases
TCP/IP
TypeScript
Scripting (Bash/Python/Go/Ruby)
Transport Layer Security
Load Balancing
Cloud Monitoring
React
Firewalls (Computer Science)
Containerization
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Bicep
TFS
Cosmos DB
Terraform
Docker
Microservices

Job description

Our organization is seeking a motivated Site Reliability Engineer (SRE) to join our dynamic Advisor Platform Engineering team. This role focuses on safeguarding the availability, performance, and scalability of our mission-critical, Azure-hosted platform. As an individual contributor, you will apply your expertise in cloud infrastructure, automation, and observability to maintain and enhance our systems, collaborating closely with Agile development teams to embed reliability principles throughout the application lifecycle., * Monitor, maintain, and optimize Azure infrastructure, ensuring the health, performance, and availability of IaaS, PaaS, and SaaS components.

  • Enhance observability by defining, measuring, and refining Service Level Indicators (SLIs) and Objectives (SLOs) using Azure Monitor, Application Insights, and Log Analytics (KQL).
  • Develop automation and tooling using scripting languages (PowerShell, Bash, Python) and potentially C#/.NET to eliminate manual tasks and improve efficiency.
  • Participate in a 24/7 on-call rotation, contributing to incident triage, mitigation, root cause analysis (RCA), and the implementation of preventive actions.
  • Collaborate with software development, QA, and other technology teams to ensure reliability, scalability, and performance requirements are met.
  • Contribute to capacity planning, load testing, and performance tuning initiatives across a .NET and React micro-service architecture.
  • Create and maintain clear documentation for systems, processes, and runbooks.
  • Troubleshoot and support integrations with third-party systems via APIs and SSO implementations., This is a hybrid position requiring on-site work in South Austin, TX, from Monday to Thursday, with an option for remote work on Fridays. The role includes participation in a 24/7 on-call rotation to support critical production releases and incidents outside of standard business hours.

Requirements

Education: A Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience is required.

Experience: This position requires 2-5 years of experience in Site Reliability Engineering, DevOps, Cloud Operations/Engineering, Systems Administration, or Software Engineering with a strong operational focus. Hands-on experience with platform development, automation strategies, and monitoring strategies within Azure is necessary. Familiarity with SQL and the .NET stack is also required for effective application troubleshooting.

Technical Skills:

  • Hands-on experience managing and troubleshooting production workloads on Microsoft Azure (IaaS & PaaS).
  • Proficiency with Azure monitoring tools (Azure Monitor, Application Insights, Log Analytics) and KQL query language.
  • Solid scripting skills for automation (e.g., PowerShell, Bash, Python).
  • Experience with CI/CD concepts and tools, particularly Azure DevOps pipelines.
  • Proficiency with Git workflows and platforms like Azure Repos or GitHub.
  • Solid understanding of networking concepts (TCP/IP, DNS, HTTP/HTTPS, TLS, Load Balancing, Firewalls).

Preferred Qualifications

  • A basic understanding or development experience with C#/.NET applications.
  • Experience with Infrastructure as Code (IaC) tools like ARM templates, Bicep, or Terraform.
  • Familiarity with containerization technologies (Docker) and orchestration (Kubernetes, Azure Container Apps).
  • Experience supporting relational (e.g., Azure SQL) and NoSQL (e.g., Cosmos DB) databases.
  • Basic familiarity with modern JavaScript front-end technologies like React/Typescript.
  • Experience working in Agile development environments or within the financial services industry.
  • Microsoft Certified: Azure Administrator Associate (AZ-104) or DevOps Engineer Expert (AZ-400).

About the company

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Apex uses a virtual recruiter as part of the application process. Click for more details.

Apply for this position