DevOps Supervisor NEX

Seventy Seven Energy LLC

Houston, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Junior

Job location

Houston, United States of America

Tech stack

Cloud Computing

Information Systems

Continuous Integration

Relational Databases

DevOps

Identity and Access Management

JSON

Python

Key Management

Modbus

Message Queuing Telemetry Transport (MQTT)

Reliability Engineering

Prometheus

Transmission Control Protocol (TCP)

Data Logging

Cloud Platform System

Mttr

Caching

Reliability of Systems

FastAPI

Kubernetes

Information Technology

Atlassian Tools

Code Inspection

Bitbucket

Terraform

Dynatrace

Bamboo

Docker

Microservices

Job description

Own end-to-end cloud infrastructure strategy - networking, Kubernetes cluster management, IAM, secrets management, and cost optimization.
Lead all Terraform IaC development across environments (dev, staging, production), enforcing consistent module patterns and state management.
Design and operate Kubernetes workloads using Kustomize overlays for both cloud and edge deployment targets.
Manage supporting infrastructure: time-series and relational databases, caching layers, and cloud-managed services.

CI/CD & Deployment

Own and mature CI/CD pipelines across all services using the Atlassian suite (Bitbucket, Bamboo / Bitbucket Pipelines) - building, linting, testing, publishing, and deploying Python/FastAPI microservices.
Standardize Docker build practices, image tagging strategies, and container registry management.
Implement and enforce GitOps workflows for Kubernetes deployments, ensuring audit trails and safe rollback capabilities.
Collaborate with development teams to reduce deployment friction and improve feedback loops.

Edge Deployments

Own deployment architecture for edge-tier workloads running on field hardware - Docker Compose stacks including MQTT and Modbus/TCP protocol adapters.
Develop reliable provisioning, update, and monitoring workflows for edge nodes in remote or low-connectivity environments.
Coordinate with product and field operations teams on edge deployment requirements, connectivity constraints, and rollout planning.

Site Reliability & 24/7 Support

Build and own the on-call program: runbooks, alerting, escalation paths, and SLO definitions.
Lead incident response, ensuring fast mitigation and thorough post-mortems that prevent recurrence.
Define and track reliability metrics (availability, MTTR, error budgets) and report to the Director of Platform Development.
Continuously improve observability across cloud and edge environments through structured logging, metrics, and distributed tracing.

Team Leadership & Cross-Functional Collaboration

Hire, mentor, and grow a team of DevOps and Platform Engineers; define career ladders and performance expectations.
Partner with backend engineering teams to support the Python/FastAPI microservices platform, authentication, and authorization policy rollouts.
Champion a security-first culture: secrets management, least-privilege IAM, dependency scanning, and compliance automation.
Manage vendor relationships, cloud spend, and tooling budget with transparency to leadership.
Perform additional duties as required and assigned., The DevOps / SRE Supervisor works with broad ownership and limited direction. The incumbent determines and develops the approach to infrastructure solutions. Work is evaluated on outcomes: system reliability, delivery velocity, and infrastructure cost efficiency.

Resolves a wide range of platform and infrastructure problems, from routine operational tasks to complex architectural decisions. Uses judgment within engineering best practices to determine the appropriate course of action. Problem resolution timeframes range from immediate incident response to multi-week infrastructure projects.

Requirements

5+ years in DevOps, SRE, or Platform Engineering roles, with at least 1-2 years in a tech lead or supervisory capacity.
Deep hands-on experience with a major cloud platform (GCP preferred) including Kubernetes, IAM, networking, and managed services.
Strong Terraform skills - writing modules, managing remote state, and structuring multi-environment configurations.
Proficiency in Kubernetes and Kustomize for managing multi-environment, multi-target (cloud + edge) workloads.
Experience building and maintaining CI/CD pipelines in the Atlassian suite (Bitbucket, Bamboo, or Bitbucket Pipelines); comfort with pipeline-as-code patterns.
Solid Docker expertise including multi-stage builds, Compose stacks, and container runtime troubleshooting.
Hands-on experience with Prometheus, structured/JSON logging, and building actionable alerting systems.
Ability to lead on-call rotations and drive incident management processes end-to-end.
Comfortable working in a Python-centric engineering environment (Python 3.12, Poetry, FastAPI familiarity preferred).
Experience with edge / IoT deployment patterns - field hardware, intermittent connectivity, or OTA update strategies.
Demonstrates positive people management skills: communicates effectively, treats team members fairly and consistently, coaches well, and takes an interest in team members' career development., * Bachelor's Degree in Computer Science, Information Systems, or a related technical field (Required).
5+ years of progressive experience in DevOps, SRE, or Platform Engineering (Required).
1-2 years of experience in a team lead or supervisory capacity (Required).

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all