SITE RELIABILITY ENGINEER

Ignite, LLC
Huntsville, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior

Job location

Huntsville, United States of America

Tech stack

API
Amazon Web Services (AWS)
Code Review
Computer Engineering
Continuous Integration
Linux
DevOps
Disaster Recovery
Distributed Systems
Github
Python
Load Testing
Octopus Deploy
Reliability Engineering
Prometheus
Software Engineering
Software Systems
Grafana
Gatling
Gitlab
Git Flow
Kubernetes
Vertica
Docker
Jenkins
Go

Job description

Ignite is currently seeking driven, detail-oriented site reliability engineer Ignite is currently seeking a driven, detail-oriented Site Reliability Engineer (SRE) to ensure the reliability, performance, and operational resilience of mission-critical software systems. This role focuses on defining reliability standards from the user perspective, instrumenting systems to measure performance against those standards, and building the tooling, automation, and operational processes that make systems resilient and recoverable. The SRE will work closely with development teams to improve operational quality early in the development lifecycle, ensuring systems are designed, tested, and deployed with reliability in mind. When production issues occur, the SRE will lead incident resolution, diagnose distributed system failures, and translate operational findings into long-term reliability improvements. This position can be filled in Dayton, OH, Huntsville, AL, or St. Louis, MO. Contingent on contract award.

Requirements

  • Platform & Infrastructure- Kubernetes, ArgoCD/GitOps, disaster recovery, capacity planning
  • Observability - OTel standards, Grafana/Perses, Tempo, Clickhouse, VictoriaMetrics
  • Automation & Toil Reduction- scripting, CI/CD, runbook automation, "DevOps"
  • Developer Enablement- instrumentation SDKs, SRE practice onboarding
  • Data & Alerting- dashboard quality, alert design, anomaly detection, + 1-3 years of experience in Operations, Sys Admin, DevOps, or Software engineering
  • Bachelor's Degree in CS, Computer Engineering, or related technical field
  • US Citizenship & must have or be able to obtain a Top Secret Clearence
  • Systems thinking - understanding how systems fail together, blast radius, and more
  • Observability Fundamentals - not just the 3 signals, but knowing why and how to use telemetry to optimize services and engineering quality of life
  • Basic software engineering - building automation & non-trivial APIs, git workflows, effectively engaging in code reviews
  • Linux/networking fundamentals
  • Strong Communication, Collaboration, and Organizational Skills

Preferred Qualifications: * + o SRE Certifications from The DevOps Institute, AWS Solution Architect, or similar o Hands-on experience with: Python, Go, Kubernetes, Argo CD, GitLab/GitHub, Jenkins, Docker, Locust/Gatling, Prometheus, Grafana/Perses

Security Clearance Requirements:

Must have an active TS/SCI Security Clearance or the ability to obtain one.

Education Requirements:

  • Bachelor's degree in relevant discipline.

About the company

Ignite is an ISO 9001:2015 and CMMI Services Level 3 certified, Service-Disabled Veteran-Owned Small Business (SDVOSB), headquartered in Huntsville, AL. By design, Ignite is a provider of professional services to customers in educational, federal, and commercial industries and in every action seeks to be the preeminent provider within this business space. Ignite upholds our values of competency, collaboration, innovation, reliability, and results through everything we do.

Apply for this position