DevOps & Site Reliability Engineer

VoltaGrid, LLC

Houston, United States of America

2 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Houston, United States of America

Tech stack

Proxmox

Amazon Web Services (AWS)

Azure

Bash

Ubuntu (Operating System)

CentOS

Cloud Computing

DevOps

DNS

Github

Monitoring of Systems

Python

Linux System Administration

Performance Tuning

Red Hat Enterprise Linux - RHEL

Reliability Engineering

Prometheus

Virtualization Technology

Datadog

Scripting (Bash/Python/Go/Ruby)

Load Balancing

Delivery Pipeline

Grafana

Containerization

Gitlab-ci

Kubernetes

Performance Monitor

Terraform

Docker

Jenkins

Job description

Position Summary: DevOps / Site Reliability Engineer to implement and evolve the infrastructure, deployment pipelines, and reliability posture of our systems. You'll work closely with engineering teams to build scalable, observable, and resilient infrastructure while driving a culture of operational excellence., * Design, build, and maintain cloud infrastructure

Manage and optimize Kubernetes clusters and containerized workloads in production
Develop and maintain infrastructureascode using Terraform (or equivalent tooling)
Build and improve CI/CD pipelines to enable fast, safe, and reliable deployments
Implement and maintain monitoring, alerting, and observability systems (Prometheus, Grafana, Datadog, or similar)
Define and track SLIs/SLOs, participate in incident response, root cause analysis, and blameless postmortems
Identify and eliminate toil through automation and selfservice tooling
Configure and maintain onprem baremetal servers and Linuxbased infrastructure
Configure, maintain, and optimize virtualized assets
Collaborate with development teams on system design, capacity planning, and performance optimization
Participate in oncall rotations and ensure production readiness of new services

Requirements

Do you have experience in System performance monitoring?, * 4+ years of experience in DevOps, SRE, or infrastructure engineering roles

Strong experience with at least one major cloud provider (AWS, GCP, or Azure AWS preferred)
Deep hands-on experience with Kubernetes and Docker in production environments
Proficiency with infrastructureascode tools, particularly Terraform
Experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar)
Solid understanding of monitoring and observability (metrics, logs, traces)
Strong scripting skills (Bash, Python, or Go)
Experience with incident management, SLObased reliability practices, and capacity planning
Strong Linux systems administration skills (Ubuntu, RHEL/CentOS, or similar)
Experience with virtualization platforms including VM provisioning, storage, networking, and cluster management
Solid understanding of networking, DNS, load balancing, and security fundamentals

Nice to Have:

Contributions to internal developer platforms or platform engineering initiatives
Proxmox VE experience
Certifications in cloud platforms (AWS SA, CKA, etc.)

The above statements are intended to describe the general nature and level of work being performed by employees assigned to this classification. All personnel may be required to perform duties outside of their normal responsibilities from time to time, as needed.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all