Site Reliability Engineer

Scientific Research Corporation
North Charleston, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

North Charleston, United States of America

Tech stack

Microsoft Access
Amazon Web Services (AWS)
Systems Engineering
Audit Trail
Azure
Cloud Computing
Configuration Management
CompTIA Security+
Computer Security
Linux
Disaster Recovery
DNS
Elasticsearch
Github
Identity and Access Management
Information Technology Operations
Virtual Private Networks (VPN)
Network Troubleshooting
Windows Server
Networking Basics
Routing
Cisco Nexus Switches
Openshift
Powershell
Scrum
Red Hat Enterprise Linux - RHEL
Reliability Engineering
Remote Administration
Site Reliability Engineering Practices
Ansible
Prometheus
Zero Trust Network Access
TCP/IP
Software Vulnerability Management
Private Cloud Environment
Datadog
SSL Certificate Management
Data Logging
Load Balancing
Grafana
HybridCloud
Firewalls (Computer Science)
Gitlab
SC Clearance
Kubernetes
Infrastructure Automation Frameworks
Low Latency
Rancher
Performance Monitor
CIS Benchmarks
Puppet
Terraform
Splunk
New Relic (SaaS)
Devsecops
Cisco networks
Docker
Jenkins
Artifactory
VMware

Job description

The Site Reliability Engineer will support a premier Navy program team in reviewing, assessing, and improving the reliability, resilience, observability, and operational maintainability of next generation Navy afloat architecture. The candidate will work with technical staff, Navy stakeholders, cybersecurity teams, infrastructure engineers, software teams, and operational representatives to ensure that SRE principles are considered early in architecture, design, integration, test, deployment, and sustainment planning. : Reviewing proposed Navy afloat architecture designs for reliability, availability, scalability, maintainability, cybersecurity alignment, and operational supportability

  • Identifying architecture and implementation risks that could affect system uptime, Fleet usability, maintainability, troubleshooting, patching, monitoring, recovery, or sustainment
  • Defining and recommending SREaligned practices for Navy afloat systems, including service level objectives, operational metrics, monitoring requirements, alerting thresholds, error budget concepts, incident response workflows, and reliability reporting
  • Assisting engineering teams in translating operational reliability requirements into technical design considerations, implementation standards, and sustainment procedures
  • Evaluating system designs against realworld afloat constraints, including limited bandwidth, intermittent connectivity, shipboard infrastructure limits, cybersecurity controls, maintenance windows, disconnected operations, and mission availability requirements
  • Supporting development of observability strategies, including logging, metrics, tracing, dashboards, alerts, health checks, and performance monitoring
  • Recommending automation opportunities to reduce manual operational workload, improve repeatability, reduce configuration drift, and improve deployment and sustainment reliability
  • Supporting root cause analysis for operational issues, test findings, integration failures, or architecture concerns, then converting findings into corrective actions and longterm reliability improvements
  • Assisting with reliabilityfocused documentation, including architecture review comments, risk assessments, operational concepts, monitoring plans, sustainment recommendations, incident response workflows, and executivelevel technical summaries
  • Working with cybersecurity stakeholders to ensure reliability recommendations also support DoD cybersecurity requirements, including STIG compliance, vulnerability management, audit logging, privileged access controls, and continuous monitoring
  • Participating in technical working groups, architecture reviews, design reviews, test planning sessions, and customer briefings
  • Supporting planning for deployment, installation, test, checkout, transition to operations, and sustainment handoff activities
  • Helping define operational readiness criteria for new or updated afloat capabilities before Fleet deployment
  • Providing recommendations that balance modern SRE practices with Navy operational constraints, cybersecurity mandates, lifecycle supportability, and mission execution needs
  • Communicating clearly with both technical and nontechnical stakeholders, including government sponsors, program managers, engineers, cybersecurity staff, and operational users

Requirements

  • Ability to obtain and maintain a DoD Secret clearance
  • U.S. citizenship required due to DoD contract and clearance requirements
  • Ability to support a remote eligible role with coordination to the primary office in Charleston, South Carolina
  • Ability to obtain CSWF / DoD 8140 aligned IAT Level II qualification within the required contract or program timeline
  • Current or ability to obtain one qualifying IAT Level II certification, typically including one of the following:
  • CompTIA Security+ CE
  • CompTIA CySA+
  • GIAC GSEC
  • ISC2 SSCP
  • EC-Council CND
  • Five or more years of experience in one or more of the following areas:
  • Site reliability engineering
  • Systems engineering
  • Platform engineering
  • DevSecOps
  • Network or infrastructure operations
  • Cloud, hybrid cloud, or enterprise hosting environments
  • Mission critical IT operations
  • Practical experience with Linux and Windows server environments, including system hardening, patching, configuration, troubleshooting, logging, and operational sustainment
  • Working knowledge of networking fundamentals, including TCP/IP, DNS, routing, switching, firewalls, load balancing, VPNs, segmentation, and network troubleshooting
  • Experience designing, reviewing, or operating highly available systems with attention to uptime, resilience, observability, recoverability, and operational risk
  • Experience with monitoring, alerting, log aggregation, performance analysis, and incident response.
  • Understanding of SRE principles, including:
  • Service level indicators
  • Service level objectives
  • Error budgets
  • Toil reduction
  • Automation first operations
  • Blameless post incident review
  • Capacity planning
  • Reliability risk assessment
  • Experience supporting cybersecurity compliance in regulated environments, preferably DoD or federal environments
  • Familiarity with vulnerability management, STIGs, security baselines, patch compliance, privileged access, audit logging, and continuous monitoring
  • Ability to evaluate architecture and design decisions for operational reliability, maintainability, cybersecurity posture, and lifecycle sustainment
  • Ability to translate technical findings into clear written recommendations for government sponsors, engineering teams, cybersecurity stakeholders, and program leadership
  • Strong written and verbal communication skills, including the ability to document technical risks, operational impacts, and recommended mitigations

Desired Skills

  • Prior experience as an SRE in Fortune 100 or similar large scale environments
  • Active DoD Secret clearance
  • Experience supporting Navy, NIWC, NAVWAR, Fleet, tactical, afloat, or shipboard systems
  • Experience with afloat or disconnected operations where bandwidth, latency, hardware constraints, cybersecurity requirements, and operational availability drive architecture decisions
  • Experience reviewing or contributing to next generation architecture for Navy, DoD, tactical edge, or mission critical platforms
  • Experience with DoD Risk Management Framework, Authority to Operate support, continuous monitoring, vulnerability remediation, POA&Ms, STIG implementation, and cyber inspection readiness
  • Experience with containerization and orchestration technologies such as Docker, Kubernetes, OpenShift, Rancher, or similar platforms
  • Experience with infrastructure as code and configuration management tools such as Ansible, Terraform, Puppet, Chef, PowerShell DSC, or similar technologies
  • Experience with CI/CD pipelines and secure software delivery using tools such as GitLab, Jenkins, GitHub Actions, Azure DevOps, Nexus, Artifactory, or similar platforms
  • Experience with observability platforms and tooling such as Prometheus, Grafana, ELK / Elastic Stack, Splunk, OpenTelemetry, Datadog, New Relic, or similar capabilities
  • Experience with cloud or hybrid environments, including AWS, Azure, Azure Government, GovCloud, private cloud, VMware, or other enterprise hosting platforms
  • Experience with backup, disaster recovery, fail-over planning, continuity of operations, and data protection for mission critical systems
  • Experience performing root cause analysis and converting incident findings into architectural, operational, or automation improvements
  • Familiarity with Zero Trust principles, identity and access management, certificate management, privileged access management, endpoint security, and secure remote administration
  • Familiarity with Navy change control, configuration management, test events, installation readiness reviews, deployment planning, or Fleet Readiness Change Board style processes
  • Experience working directly with government customers, system owners, cybersecurity teams, network engineers, software teams, and operational users
  • One or more of the following certifications:
  • Active Security+ CE or higher DoD 8140 / IAT Level II qualifying certification.
  • CompTIA CySA+
  • ISC2 SSCP
  • GIAC GSEC
  • GIAC GCIH
  • GIAC GCIA
  • GIAC GCWN or GCUX
  • Red Hat Certified System Administrator
  • Red Hat Certified Engineer
  • Certified Kubernetes Administrator
  • AWS Certified SysOps Administrator
  • AWS Solutions Architect
  • Microsoft Azure Administrator
  • VMware Certified Professional
  • Cisco CCNA or CCNP
  • ITIL Foundation
  • Certified ScrumMaster or SAFe certification, where relevant to program execution

Benefits & conditions

life insurance, paid time off, paid holidays, sick time, tuition reimbursement, 401(k), SRC offers a generous benefit package, including medical, dental, and vision plans, 401(k) with a company match, life insurance, vacation and sick paid time off accruals starting at 10 days of vacation and 5 days of sick leave annually, 11 paid holidays, tuition reimbursement, and a work environment that encourages excellence and more. For positions requiring a security clearance, selected applicants will be subject to a government security investigation and must meet eligibility requirements for access to classified information. EEO

Scientific Research Corporation is an equal opportunity employer that does not discriminate in employment.

About the company

Scientific Research Corporation is an advanced information technology and engineering company that provides innovative products and services to government and private industry, as well as independent institutions. At the core of our capabilities is a seasoned team of highly skilled engineers and scientists with multidisciplinary backgrounds. This team is challenged daily to provide cutting edge technology solutions to our clients.

Apply for this position