Site Reliability Engineer I- Operations

Utah Valley University
Orem, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Compensation
$ 59K

Job location

Orem, United States of America

Tech stack

Microsoft Access
JavaScript
Microsoft Windows
Microsoft Active Directory
Agile Methodologies
Antivirus Softwares
Asana
Confluence
JIRA
Bash
Cluster Analysis
Collaborative Software
Databases
Data Centers
Data Security
Relational Databases
Data Center Infrastructure Management (CIM)
Linux
DevOps
Disaster Recovery
Monitoring of Systems
Information Technology Operations
JSON
Python
Lightweight Directory Access Protocols (LDAP)
Microsoft SQL Server
MySQL
Oracle Applications
Paessler Router Traffic Grapher
Scrum
Reliability Engineering
Azure
Newrelic
Prometheus
Runbook
Selenium
PL-SQL
SQL Databases
Load Balancing
Cloud Monitoring
Reliability of Systems
Firewalls (Computer Science)
Storage Technologies
Atlassian Tools
Cloudwatch
Splunk
Pagerduty
ServiceNow

Job description

At Utah Valley University, this role offers the opportunity to play a critical part in supporting the infrastructure that powers teaching, learning, and daily operations across a dynamic campus environment. Working closely with senior administrators, you will manage and optimize enterprise systems and applications, ensuring reliability, security, and performance at scale. From configuring servers and maintaining system health to building monitoring solutions and automating processes through CI/CD pipelines, this position allows you to apply and grow your technical expertise while making a meaningful impact on the university community.

In addition to hands-on systems and site reliability engineering work, you will collaborate across teams on complex initiatives, contribute to innovative solutions, and help drive operational excellence. With access to modern tools like Atlassian platforms and opportunities to enhance system resilience and efficiency, this role is ideal for someone who values continuous improvement, teamwork, and purpose-driven work. UVU provides a supportive environment where your contributions directly enhance user experiences and help ensure access to reliable technology for students, faculty, and staff., * Under close supervision, epic plans and executes projects related to the three pillars of IT operations: operational processes, change, incident problem, and Ops readiness. Assists in the execution of monitoring systems and alert configurations so that Operations knows about outages before users.

  • Collaborates with leadership on the creation, facilitation, and integration of documentation, including installation steps, standard operating procedures, incident runbooks, and disaster recovery documentation into a curated change/incident/problem management library. Assists Network, Application, database, and systems administrators with the enforcement of standard procedures, acts as a remote hands within a secure data center, and maintains all required supplies and tooling for the deployment of physical enterprise equipment.
  • As an incident commander, participates in business-hour on-call rotation, evaluating incoming alerts for validity and dispatching the appropriate SME to resolve issues. Executes public communications in accordance with Operational standard procedures, informing stakeholders of possible service disruptions. Maintains the integrity of Runbooks.
  • Perform other job-related duties as assigned.

Requirements

Do you have experience in Video conferences (communication methods)?, Do you have a Associate's degree?, * An associate degree and a minimum of two years of relevant experience, or an equivalent combination of education and experience totaling four years.

  • Current CompTIA A+, Network+, Security+, or Linux+ certification, or an equivalent industry-recognized IT credential, required., * Knowledge of Linux and Windows Operating systems, TCP/IP fundamentals, firewall management, and anti-virus software.
  • Knowledge of best practices for securing operating systems, data center maintenance, and network setup.
  • Knowledge of various Monitoring solutions such as Prometheus, PRTG, Site24x7, TestCafe, Selenium, Splunk, NewRelic, Azure Monitor, and AWS CloudWatch.
  • Knowledge of storage technologies such as SAN or NAS.
  • Knowledge of Azure Active Directory, Active Directory, and LDAP.
  • Knowledge of load balancing, clustering, and enterprise server architecture.
  • Knowledge of Relational Database principles and databases/languages such as PL/SQL, MySQL, SQL Server, Oracle, Microsoft SQL, or MS Access.
  • Knowledge of the Atlassian Suite, including Jira, Confluence, Status Page, and Opsgenie.
  • Knowledge of Scrum/Agile principles as applicable to a DevOps Team., * Communicate effectively in normal and high-pressure situations verbally and through written mediums.
  • Perform basic server, system, and application procedures such as managing user access, performing maintenance, and troubleshooting.
  • Skills in troubleshooting hardware and software problems and researching technical issues.
  • Experience using basic CLI tools in Windows and Linux operating systems to troubleshoot and gather information.
  • Skills in customer service and interpersonal communication, both verbal and written.
  • Basic scripting and programming skills in languages such as Python, JavaScript, JSON, SQL, Bash, TestCafe, and Selenium.
  • Experience with instant communication and team collaboration platforms like MS Teams, Slack, or Jitsi.
  • Skills in working in an ITSM solution such as Jira, ServiceNow, and Asana.

Abilities

  • Ability to identify, research, troubleshoot, and implement solutions for hardware and software problems. Ability to work in a customer service, team-oriented, collaborative, Scrum/Agile environment.

  • Highly self-motivated with the ability to learn quickly and accept feedback from peers.

  • Ability to learn the implementation process and maintenance procedures for new technologies, equipment, hardware, and software such as operating systems, ITSM tools, monitoring solutions, and data center management.

  • Ability to act as an "on-call" incident commander for communicating outages between customers, subject matter experts, teams, and leaders.

  • Ability to create proposals in visually-pleasing and user-friendly language.

  • Ability to think critically and solve complex problems.

  • Ability to perform tasks in a timely and professional manner.

Apply for this position