Lead Site Reliability Engineer - (GCP & Kubernetes)

Htc Inc.

Celebration, United States of America

24 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Celebration, United States of America

Tech stack

Amazon Web Services (AWS)

Azure

Bash

Cloud Computing

Cloud Engineering

DevOps

Disaster Recovery

Distributed Systems

Python

Reliability Engineering

Prometheus

Google Cloud Platform

Grafana

Kubernetes Helm Charts

Multi-Cloud

Kubernetes

Cloud Migration

Terraform

Splunk

Job description

We are seeking a Lead Site Reliability Engineer to drive reliability, scalability, and operational excellence across a rapidly growing technology ecosystem. This role serves as a technical leader focused on cloud architecture, Kubernetes platforms, infrastructure automation, and highly available distributed systems. The position plays a key role in defining infrastructure strategy, improving platform resiliency, and mentoring engineering teams., * Design and support highly available cloud infrastructure in GCP

Architect and manage Kubernetes environments at scale
Build and maintain Infrastructure-as-Code using Terraform
Develop and manage Helm charts and Kubernetes deployments
Design failover, disaster recovery, and multi-region strategies
Improve platform scalability, reliability, and performance
Implement monitoring, alerting, and observability best practices
Partner with engineering teams on platform architecture and cloud adoption
Mentor engineers and provide technical leadership

Requirements

Do you have experience in Terraform?, * 7+ years of experience in Site Reliability Engineering, Platform Engineering, Cloud Engineering, or DevOps

Expert-level Kubernetes experience
Strong Google Cloud Platform (GCP) experience
Expertise with Terraform
Experience with Helm
Multi-cloud exposure, including AWS and Azure
Experience with distributed systems
Python or Bash scripting experience
Experience with Prometheus, Grafana, Splunk, or OpenTelemetry, * SRE, DevOps, Infrastructure, Platform, or Cloud Operations: 5 years (Required)
expert level Kubernetes managing deployments at scale: 3 years (Required)

Benefits & conditions

Pulled from the full job description

401(k)
Health insurance
401(k) matching
Paid time off
Employee discount
Vision insurance
Health savings account, * 401(k)
401(k) matching
Dental insurance
Employee assistance program
Employee discount
Health insurance
Health savings account
Life insurance
Paid time off
Relocation assistance
Vision insurance

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all