Site Reliability Engineer

TCN & Co LLC

St. George, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

St. George, United States of America

Tech stack

Java

API

Application Performance Management

Bash

Border Gateway Protocol

Configuration Management

Databases

Continuous Delivery

Software Debugging

Linux

DevOps

Distributed Data Store

Systems Analysis

Internet Protocol Security (IP SEC)

IP Routing

Python

Network Protocols

Node.js

Open Shortest Path First

Reliability Engineering

Ruby

Subsystems

System Software

Google Cloud Platform

Kubernetes

Infrastructure Automation Frameworks

Information Technology

Programming Languages

Job description

TCN is looking for a Site Reliability Engineer to join our team in Saint George, Utah. The Site Reliability Engineer works as part of a team to analyze, troubleshoot, deploy, monitor, and maintain TCN's large production environment with global scale. These significant responsibilities are completed while continually thinking about reliability, scalability, resilience, security, and performance. The Site Reliability Engineer's responsibilities are critical to the continuity of the services provided to TCN's clients., * Designs and deploys software/systems - Collaborates with development teams to throughout the product life cycle, including but not limited to engaging in the design, development, deployment, and ongoing delivery of services; assists in ensuring the development of software and systems that increase product reliability and organizational efficiency

Manages solutions and ensures resistance to failure - Deploys and manages solutions to manage platform infrastructure as we continue to grow our global scale; ensuring resistance to failure
Troubleshoots - Troubleshoots complicated, cross platform incidents for OS, networking, and database in a cloud-based SaaS environment; ability to handle live production incidents, debug and troubleshoot application and infrastructure issues, and follow and implement best practices
Post-incident evaluation - Participates in post-incident evaluations and ensures permanent closure of incidents
Monitors performance | Improves application stability - Monitors application performance and takes steps to improve application performance and stability; follows through with implementation
Conducts analysis and development improvements - Conducts system analysis, configuration management, and development improvements for system software performance, availability, and reliability
Identifies application patterns and analytics in support of better service level objectives
Incident response - Participates in 24x7 incident response and on-call rotation
Shares best practices - Shares understanding of Site Reliability Engineering culture across organization; shares knowledge of best practices, approaches, documentation, and code with team members and other teams

Requirements

Do you have experience in Stakeholder relationship building?, Do you have a Bachelor's degree?, The ideal candidate will have at least three (3) years' experience working in a Linux environment as a System Administrator, Site Reliability Engineer, or a similar role., * Bachelor's degree in computer science, information technology, or related field of study

Not less than three (3) years' experience in a Linux environment as a System Administrator, Site Reliability Engineer, or similar role
Demonstrated advanced knowledge of networking protocols, including but not limited to IP routing (static/BGP/OSPF), TCP/UDP fundamentals, security (TLS, IPSEC), and common application protocols
Demonstrated advanced knowledge of Linux operating environment including storage, network, and container subsystems
Proven skills in incident management and root cause analysis
Demonstrated experience with Google Cloud Platform (APIs and CLIs)
Experience with configuration management tools
Experience with scripting and automation in commonly used languages, including but not limited to Bash, Ruby, and Python
Familiarity with programming languages used for DevOps/Continuous Delivery, including but not limited to Go, Java, and Node.Js
Experience with distributed storage, containers, containerizing applications, and container orchestration (Kubernetes)
Excellent communication skills, both oral and written; ability to adapt message/style to fit audience (i.e., ability to communicate technical concepts to a non-technical audience)
Strong interpersonal skills with the ability to work with all levels of management and employees; ability to gain credibility, provide effective customer service, and foster positive working relationships with internal and external stakeholders
Excellent attention to detail; ability to work accurately and to identify, analyze, prevent, and solve problems

Benefits & conditions

Pulled from the full job description

Health insurance
401(k) matching
Paid time off
Vision insurance
Health savings account
Dental insurance
Life insurance, * Medical Insurance (HDHP with HSA)
Dental Insurance
Vision Insurance
Life Insurance
401k with employer match
Competitive salary
Paid time off
Paid holidays (11 scheduled)
Weekly lunches; free drinks and snacks
Casual dress and flexible work environment

FP7NLANEqv

About the company

TCN is a fast-growing technology company and provides all its services over the internet in a cloud-based software-as-a-service model. TCN's technology stack and culture are positive and forward-thinking. When you join TCN, you are joining a dedicated team of professionals. Employees often describe our culture as friendly, collaborative, flexible, and fast-paced. To learn more, visit our website.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all