DevOps & Site Reliability Engineer

VoltaGrid, LLC
Houston, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Houston, United States of America

Tech stack

Proxmox
Amazon Web Services (AWS)
Azure
Bash
Ubuntu (Operating System)
CentOS
Cloud Computing
DevOps
DNS
Github
Monitoring of Systems
Python
Linux System Administration
Performance Tuning
Red Hat Enterprise Linux - RHEL
Reliability Engineering
Prometheus
Virtualization Technology
Datadog
Scripting (Bash/Python/Go/Ruby)
Load Balancing
Delivery Pipeline
Grafana
Containerization
Gitlab-ci
Kubernetes
Performance Monitor
Terraform
Docker
Jenkins

Job description

Position Summary: DevOps / Site Reliability Engineer to implement and evolve the infrastructure, deployment pipelines, and reliability posture of our systems. You'll work closely with engineering teams to build scalable, observable, and resilient infrastructure while driving a culture of operational excellence., * Design, build, and maintain cloud infrastructure

  • Manage and optimize Kubernetes clusters and containerized workloads in production
  • Develop and maintain infrastructureascode using Terraform (or equivalent tooling)
  • Build and improve CI/CD pipelines to enable fast, safe, and reliable deployments
  • Implement and maintain monitoring, alerting, and observability systems (Prometheus, Grafana, Datadog, or similar)
  • Define and track SLIs/SLOs, participate in incident response, root cause analysis, and blameless postmortems
  • Identify and eliminate toil through automation and selfservice tooling
  • Configure and maintain onprem baremetal servers and Linuxbased infrastructure
  • Configure, maintain, and optimize virtualized assets
  • Collaborate with development teams on system design, capacity planning, and performance optimization
  • Participate in oncall rotations and ensure production readiness of new services

Requirements

Do you have experience in System performance monitoring?, * 4+ years of experience in DevOps, SRE, or infrastructure engineering roles

  • Strong experience with at least one major cloud provider (AWS, GCP, or Azure AWS preferred)
  • Deep hands-on experience with Kubernetes and Docker in production environments
  • Proficiency with infrastructureascode tools, particularly Terraform
  • Experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or similar)
  • Solid understanding of monitoring and observability (metrics, logs, traces)
  • Strong scripting skills (Bash, Python, or Go)
  • Experience with incident management, SLObased reliability practices, and capacity planning
  • Strong Linux systems administration skills (Ubuntu, RHEL/CentOS, or similar)
  • Experience with virtualization platforms including VM provisioning, storage, networking, and cluster management
  • Solid understanding of networking, DNS, load balancing, and security fundamentals

Nice to Have:

  • Contributions to internal developer platforms or platform engineering initiatives
  • Proxmox VE experience
  • Certifications in cloud platforms (AWS SA, CKA, etc.)

The above statements are intended to describe the general nature and level of work being performed by employees assigned to this classification. All personnel may be required to perform duties outside of their normal responsibilities from time to time, as needed.

Apply for this position