DevOps Supervisor NEX

Seventy Seven Energy LLC
Houston, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior

Job location

Houston, United States of America

Tech stack

Cloud Computing
Information Systems
Continuous Integration
Relational Databases
DevOps
Identity and Access Management
JSON
Python
Key Management
Modbus
Message Queuing Telemetry Transport (MQTT)
Reliability Engineering
Prometheus
Transmission Control Protocol (TCP)
Data Logging
Cloud Platform System
Mttr
Caching
Reliability of Systems
FastAPI
Kubernetes
Information Technology
Atlassian Tools
Code Inspection
Bitbucket
Terraform
Dynatrace
Bamboo
Docker
Microservices

Job description

  • Own end-to-end cloud infrastructure strategy - networking, Kubernetes cluster management, IAM, secrets management, and cost optimization.
  • Lead all Terraform IaC development across environments (dev, staging, production), enforcing consistent module patterns and state management.
  • Design and operate Kubernetes workloads using Kustomize overlays for both cloud and edge deployment targets.
  • Manage supporting infrastructure: time-series and relational databases, caching layers, and cloud-managed services.

CI/CD & Deployment

  • Own and mature CI/CD pipelines across all services using the Atlassian suite (Bitbucket, Bamboo / Bitbucket Pipelines) - building, linting, testing, publishing, and deploying Python/FastAPI microservices.
  • Standardize Docker build practices, image tagging strategies, and container registry management.
  • Implement and enforce GitOps workflows for Kubernetes deployments, ensuring audit trails and safe rollback capabilities.
  • Collaborate with development teams to reduce deployment friction and improve feedback loops.

Edge Deployments

  • Own deployment architecture for edge-tier workloads running on field hardware - Docker Compose stacks including MQTT and Modbus/TCP protocol adapters.
  • Develop reliable provisioning, update, and monitoring workflows for edge nodes in remote or low-connectivity environments.
  • Coordinate with product and field operations teams on edge deployment requirements, connectivity constraints, and rollout planning.

Site Reliability & 24/7 Support

  • Build and own the on-call program: runbooks, alerting, escalation paths, and SLO definitions.
  • Lead incident response, ensuring fast mitigation and thorough post-mortems that prevent recurrence.
  • Define and track reliability metrics (availability, MTTR, error budgets) and report to the Director of Platform Development.
  • Continuously improve observability across cloud and edge environments through structured logging, metrics, and distributed tracing.

Team Leadership & Cross-Functional Collaboration

  • Hire, mentor, and grow a team of DevOps and Platform Engineers; define career ladders and performance expectations.

  • Partner with backend engineering teams to support the Python/FastAPI microservices platform, authentication, and authorization policy rollouts.

  • Champion a security-first culture: secrets management, least-privilege IAM, dependency scanning, and compliance automation.

  • Manage vendor relationships, cloud spend, and tooling budget with transparency to leadership.

  • Perform additional duties as required and assigned., The DevOps / SRE Supervisor works with broad ownership and limited direction. The incumbent determines and develops the approach to infrastructure solutions. Work is evaluated on outcomes: system reliability, delivery velocity, and infrastructure cost efficiency.

Resolves a wide range of platform and infrastructure problems, from routine operational tasks to complex architectural decisions. Uses judgment within engineering best practices to determine the appropriate course of action. Problem resolution timeframes range from immediate incident response to multi-week infrastructure projects.

Requirements

  • 5+ years in DevOps, SRE, or Platform Engineering roles, with at least 1-2 years in a tech lead or supervisory capacity.
  • Deep hands-on experience with a major cloud platform (GCP preferred) including Kubernetes, IAM, networking, and managed services.
  • Strong Terraform skills - writing modules, managing remote state, and structuring multi-environment configurations.
  • Proficiency in Kubernetes and Kustomize for managing multi-environment, multi-target (cloud + edge) workloads.
  • Experience building and maintaining CI/CD pipelines in the Atlassian suite (Bitbucket, Bamboo, or Bitbucket Pipelines); comfort with pipeline-as-code patterns.
  • Solid Docker expertise including multi-stage builds, Compose stacks, and container runtime troubleshooting.
  • Hands-on experience with Prometheus, structured/JSON logging, and building actionable alerting systems.
  • Ability to lead on-call rotations and drive incident management processes end-to-end.
  • Comfortable working in a Python-centric engineering environment (Python 3.12, Poetry, FastAPI familiarity preferred).
  • Experience with edge / IoT deployment patterns - field hardware, intermittent connectivity, or OTA update strategies.
  • Demonstrates positive people management skills: communicates effectively, treats team members fairly and consistently, coaches well, and takes an interest in team members' career development., * Bachelor's Degree in Computer Science, Information Systems, or a related technical field (Required).
  • 5+ years of progressive experience in DevOps, SRE, or Platform Engineering (Required).
  • 1-2 years of experience in a team lead or supervisory capacity (Required).

Apply for this position