Site Reliability Engineer

TCN & Co LLC
St. George, United States of America
12 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

St. George, United States of America

Tech stack

Java
API
Application Performance Management
Bash
Border Gateway Protocol
Configuration Management
Databases
Continuous Delivery
Software Debugging
Linux
DevOps
Distributed Data Store
Systems Analysis
Internet Protocol Security (IP SEC)
IP Routing
Python
Network Protocols
Node.js
Open Shortest Path First
Reliability Engineering
Ruby
Subsystems
System Software
Google Cloud Platform
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Go
Programming Languages

Job description

TCN is looking for a Site Reliability Engineer to join our team in Saint George, Utah. The Site Reliability Engineer works as part of a team to analyze, troubleshoot, deploy, monitor, and maintain TCN's large production environment with global scale. These significant responsibilities are completed while continually thinking about reliability, scalability, resilience, security, and performance. The Site Reliability Engineer's responsibilities are critical to the continuity of the services provided to TCN's clients., * Designs and deploys software/systems - Collaborates with development teams to throughout the product life cycle, including but not limited to engaging in the design, development, deployment, and ongoing delivery of services; assists in ensuring the development of software and systems that increase product reliability and organizational efficiency

  • Manages solutions and ensures resistance to failure - Deploys and manages solutions to manage platform infrastructure as we continue to grow our global scale; ensuring resistance to failure
  • Troubleshoots - Troubleshoots complicated, cross platform incidents for OS, networking, and database in a cloud-based SaaS environment; ability to handle live production incidents, debug and troubleshoot application and infrastructure issues, and follow and implement best practices
  • Post-incident evaluation - Participates in post-incident evaluations and ensures permanent closure of incidents
  • Monitors performance | Improves application stability - Monitors application performance and takes steps to improve application performance and stability; follows through with implementation
  • Conducts analysis and development improvements - Conducts system analysis, configuration management, and development improvements for system software performance, availability, and reliability
  • Identifies application patterns and analytics in support of better service level objectives
  • Incident response - Participates in 24x7 incident response and on-call rotation
  • Shares best practices - Shares understanding of Site Reliability Engineering culture across organization; shares knowledge of best practices, approaches, documentation, and code with team members and other teams

Requirements

Do you have experience in Stakeholder relationship building?, Do you have a Bachelor's degree?, The ideal candidate will have at least three (3) years' experience working in a Linux environment as a System Administrator, Site Reliability Engineer, or a similar role., * Bachelor's degree in computer science, information technology, or related field of study

  • Not less than three (3) years' experience in a Linux environment as a System Administrator, Site Reliability Engineer, or similar role
  • Demonstrated advanced knowledge of networking protocols, including but not limited to IP routing (static/BGP/OSPF), TCP/UDP fundamentals, security (TLS, IPSEC), and common application protocols
  • Demonstrated advanced knowledge of Linux operating environment including storage, network, and container subsystems
  • Proven skills in incident management and root cause analysis
  • Demonstrated experience with Google Cloud Platform (APIs and CLIs)
  • Experience with configuration management tools
  • Experience with scripting and automation in commonly used languages, including but not limited to Bash, Ruby, and Python
  • Familiarity with programming languages used for DevOps/Continuous Delivery, including but not limited to Go, Java, and Node.Js
  • Experience with distributed storage, containers, containerizing applications, and container orchestration (Kubernetes)
  • Excellent communication skills, both oral and written; ability to adapt message/style to fit audience (i.e., ability to communicate technical concepts to a non-technical audience)
  • Strong interpersonal skills with the ability to work with all levels of management and employees; ability to gain credibility, provide effective customer service, and foster positive working relationships with internal and external stakeholders
  • Excellent attention to detail; ability to work accurately and to identify, analyze, prevent, and solve problems

Benefits & conditions

Pulled from the full job description

  • Health insurance
  • 401(k) matching
  • Paid time off
  • Vision insurance
  • Health savings account
  • Dental insurance
  • Life insurance, * Medical Insurance (HDHP with HSA)
  • Dental Insurance
  • Vision Insurance
  • Life Insurance
  • 401k with employer match
  • Competitive salary
  • Paid time off
  • Paid holidays (11 scheduled)
  • Weekly lunches; free drinks and snacks
  • Casual dress and flexible work environment

FP7NLANEqv

About the company

TCN is a fast-growing technology company and provides all its services over the internet in a cloud-based software-as-a-service model. TCN's technology stack and culture are positive and forward-thinking. When you join TCN, you are joining a dedicated team of professionals. Employees often describe our culture as friendly, collaborative, flexible, and fast-paced. To learn more, visit our website.

Apply for this position