Site Reliability Engineer I- Operations

Utah Valley University

Orem, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

$ 59K

Job location

Orem, United States of America

Tech stack

Microsoft Access

JavaScript

Microsoft Windows

Microsoft Active Directory

Agile Methodologies

Antivirus Softwares

Asana

Confluence

JIRA

Bash

Cluster Analysis

Collaborative Software

Databases

Data Centers

Data Security

Relational Databases

Data Center Infrastructure Management (CIM)

Linux

DevOps

Disaster Recovery

Monitoring of Systems

Information Technology Operations

JSON

Python

Lightweight Directory Access Protocols (LDAP)

Microsoft SQL Server

MySQL

Oracle Applications

Paessler Router Traffic Grapher

Scrum

Reliability Engineering

Azure

Newrelic

Prometheus

Runbook

Selenium

PL-SQL

SQL Databases

Load Balancing

Cloud Monitoring

Reliability of Systems

Firewalls (Computer Science)

Storage Technologies

Atlassian Tools

Cloudwatch

Splunk

Pagerduty

ServiceNow

Job description

At Utah Valley University, this role offers the opportunity to play a critical part in supporting the infrastructure that powers teaching, learning, and daily operations across a dynamic campus environment. Working closely with senior administrators, you will manage and optimize enterprise systems and applications, ensuring reliability, security, and performance at scale. From configuring servers and maintaining system health to building monitoring solutions and automating processes through CI/CD pipelines, this position allows you to apply and grow your technical expertise while making a meaningful impact on the university community.

In addition to hands-on systems and site reliability engineering work, you will collaborate across teams on complex initiatives, contribute to innovative solutions, and help drive operational excellence. With access to modern tools like Atlassian platforms and opportunities to enhance system resilience and efficiency, this role is ideal for someone who values continuous improvement, teamwork, and purpose-driven work. UVU provides a supportive environment where your contributions directly enhance user experiences and help ensure access to reliable technology for students, faculty, and staff., * Under close supervision, epic plans and executes projects related to the three pillars of IT operations: operational processes, change, incident problem, and Ops readiness. Assists in the execution of monitoring systems and alert configurations so that Operations knows about outages before users.

Collaborates with leadership on the creation, facilitation, and integration of documentation, including installation steps, standard operating procedures, incident runbooks, and disaster recovery documentation into a curated change/incident/problem management library. Assists Network, Application, database, and systems administrators with the enforcement of standard procedures, acts as a remote hands within a secure data center, and maintains all required supplies and tooling for the deployment of physical enterprise equipment.
As an incident commander, participates in business-hour on-call rotation, evaluating incoming alerts for validity and dispatching the appropriate SME to resolve issues. Executes public communications in accordance with Operational standard procedures, informing stakeholders of possible service disruptions. Maintains the integrity of Runbooks.
Perform other job-related duties as assigned.

Requirements

Do you have experience in Video conferences (communication methods)?, Do you have a Associate's degree?, * An associate degree and a minimum of two years of relevant experience, or an equivalent combination of education and experience totaling four years.

Current CompTIA A+, Network+, Security+, or Linux+ certification, or an equivalent industry-recognized IT credential, required., * Knowledge of Linux and Windows Operating systems, TCP/IP fundamentals, firewall management, and anti-virus software.
Knowledge of best practices for securing operating systems, data center maintenance, and network setup.
Knowledge of various Monitoring solutions such as Prometheus, PRTG, Site24x7, TestCafe, Selenium, Splunk, NewRelic, Azure Monitor, and AWS CloudWatch.
Knowledge of storage technologies such as SAN or NAS.
Knowledge of Azure Active Directory, Active Directory, and LDAP.
Knowledge of load balancing, clustering, and enterprise server architecture.
Knowledge of Relational Database principles and databases/languages such as PL/SQL, MySQL, SQL Server, Oracle, Microsoft SQL, or MS Access.
Knowledge of the Atlassian Suite, including Jira, Confluence, Status Page, and Opsgenie.
Knowledge of Scrum/Agile principles as applicable to a DevOps Team., * Communicate effectively in normal and high-pressure situations verbally and through written mediums.
Perform basic server, system, and application procedures such as managing user access, performing maintenance, and troubleshooting.
Skills in troubleshooting hardware and software problems and researching technical issues.
Experience using basic CLI tools in Windows and Linux operating systems to troubleshoot and gather information.
Skills in customer service and interpersonal communication, both verbal and written.
Basic scripting and programming skills in languages such as Python, JavaScript, JSON, SQL, Bash, TestCafe, and Selenium.
Experience with instant communication and team collaboration platforms like MS Teams, Slack, or Jitsi.
Skills in working in an ITSM solution such as Jira, ServiceNow, and Asana.

Abilities

Ability to identify, research, troubleshoot, and implement solutions for hardware and software problems. Ability to work in a customer service, team-oriented, collaborative, Scrum/Agile environment.
Highly self-motivated with the ability to learn quickly and accept feedback from peers.
Ability to learn the implementation process and maintenance procedures for new technologies, equipment, hardware, and software such as operating systems, ITSM tools, monitoring solutions, and data center management.
Ability to act as an "on-call" incident commander for communicating outages between customers, subject matter experts, teams, and leaders.
Ability to create proposals in visually-pleasing and user-friendly language.
Ability to think critically and solve complex problems.
Ability to perform tasks in a timely and professional manner.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all