Site Reliability Engineer

Ddc It Services, LLC
Scottsdale, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Scottsdale, United States of America

Tech stack

Agile Methodologies
Amazon Web Services (AWS)
Application Release Automation
Systems Engineering
ArcGIS (Software)
Cloud Computing
Databases
Continuous Integration
Distributed Systems
Linux System Administration
Nagios
Performance Tuning
Reliability Engineering
Runbook
Software Vulnerability Management
Esri GIS (Software)
Data Logging
Scripting (Bash/Python/Go/Ruby)
Enterprise Software Applications
Cloud Platform System
Software Security
Reliability of Systems
SC Clearance
Kubernetes
Information Technology
Devsecops

Job description

The Site Reliability Engineer (SRE) / Subject Matter Expert (SME) - Computer Systems Engineer/Architect will provide senior-level reach-back expertise to support the reliability, scalability, performance, and operational resilience of the GEOMAP platform in secure cloud environments. This role focuses on improving service availability, monitoring, incident response, automation, and production stability across cloud-hosted and containerized systems supporting mission-critical geospatial capabilities for the U.S. Air Force. The Site Reliability Engineer will collaborate across development, DevSecOps, cloud, database, testing, and support teams to identify systemic issues, reduce operational risk, and implement engineering solutions that improve long-term platform reliability. This position is contingent upon contract award. Responsibilities

  • Provide senior-level engineering support to improve reliability, availability, performance, and maintainability of GEOMAP cloud-hosted systems and services.
  • Analyze production issues, recurring incidents, and operational trends to identify root causes and recommend durable corrective actions.
  • Support the design and implementation of monitoring, alerting, logging, and observability solutions across applications, infrastructure, and containerized services.
  • Develop and recommend automation approaches that reduce manual effort, improve deployment consistency, and increase system resilience.
  • Partner with software engineers, DevSecOps engineers, Kubernetes engineers, database engineers, and production support personnel to improve service health and release readiness.
  • Support incident response, problem management, service restoration, and post-incident reviews for high-priority operational issues.
  • Evaluate system performance, capacity, and scalability needs and provide recommendations for optimization and operational risk reduction.
  • Assist in defining service reliability objectives, operational metrics, and support models for sustained mission operations.
  • Contribute to infrastructure and platform engineering efforts involving cloud environments, CI/CD pipelines, container orchestration, and secure deployment patterns.
  • Support architecture reviews, technical assessments, and engineering analyses related to reliability, recoverability, and production operations.
  • Develop or refine runbooks, standard operating procedures, reliability engineering practices, and technical documentation.
  • Provide reach-back support for surge requirements, complex production investigations, and priority modernization or stabilization efforts as directed.
  • Performs other related duties as assigned.

Requirements

Do you have experience in Tooling?, Do you have a Master's degree?, * Active Secret clearance required.

  • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field; Master's degree preferred.
  • Minimum of 8 years of experience supporting enterprise systems, cloud platforms, site reliability engineering, production engineering, systems engineering, or related technical roles.
  • Experience supporting AWS environments, including monitoring, performance tuning, troubleshooting, incident response, and operational sustainment.
  • Experience with Linux administration, scripting, and troubleshooting distributed applications in production environments.
  • Experience with containerized systems and orchestration platforms such as Kubernetes.
  • Experience supporting CI/CD pipelines, release automation, infrastructure-as-code, and operational reliability in Agile or DevSecOps environments.
  • Experience with monitoring, logging, and alerting tools used to support enterprise application performance and infrastructure visibility.
  • Strong analytical, troubleshooting, documentation, and communication skills, with the ability to translate operational issues into engineering improvements.
  • Ability to work effectively across cross-functional teams in a mission-focused DoD environment.

Preferred

  • Experience supporting AWS Cloud One or other secure federal cloud environments.
  • Experience supporting geospatial or Esri-based platforms, including ArcGIS Enterprise or related technologies.
  • Familiarity with service reliability practices such as SLIs, SLOs, error budgets, incident postmortems, and capacity planning.
  • Experience with Risk Management Framework (RMF), STIG compliance, vulnerability remediation, and secure system hardening practices.
  • AWS, Kubernetes, or other relevant cloud or reliability engineering certifications.
  • Experience supporting technical refresh, platform modernization, or high-availability design initiatives in enterprise environments.

Benefits & conditions

Pulled from the full job description

  • Tuition reimbursement
  • Health insurance
  • Retirement plan
  • Paid time off
  • Vision insurance
  • Dental insurance
  • Employee assistance program, Eligible full-time employees receive a comprehensive benefits package, including medical, dental, vision, life and disability coverage, retirement savings with company match, paid time off, voluntary supplemental benefits, and access to an employee assistance program. The package also includes educational assistance, with tuition reimbursement. EEO Statement

About the company

Diné Development Corporation (DDC) is a Navajo Nation owned family of companies that provides government agencies and commercial organizations with high-quality IT, professional, environmental, and research and development services. DDC is dedicated to empowering the Navajo Nation and communities we serve.

Apply for this position