Sr. Cloud Engineer

VENATOR HOLDINGS, LLC
Rochester, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Rochester, United States of America

Tech stack

Amazon Web Services (AWS)
Backup Devices
Cloud Computing
Cloud Computing Security
Cloud Engineering
Configuration Management
Continuous Integration
Disaster Recovery
Github
Linux System Administration
Operational Databases
Performance Tuning
Reliability Engineering
Site Reliability Engineering Practices
Prometheus
Software Systems
Systems Architecture
Software Vulnerability Management
Datadog
Data Logging
Pulumi
Cloud Platform System
Autoscaling
Istio
Delivery Pipeline
Mttr
Kubernetes Helm Charts
Kubernetes
Deployment Automation
Bitbucket
Cloudwatch
Terraform
New Relic (SaaS)
Software Version Control
Bamboo

Job description

Our client is building a modern, cloud-native platform that powers connected, data-driven manufacturing operations. Their technology sits at the center of increasingly automated factories, integrating equipment, software systems, and real-time production data into a scalable SaaS platform used by global manufacturers., To support rapid growth and platform scale, they are seeking a Senior Cloud Operations Engineer to own the reliability, performance, and operational excellence of their cloud infrastructure. This is a highly impactful role responsible for ensuring the platform remains highly available, secure, and scalable as adoption continues to grow.

This position is ideal for engineers who thrive in modern cloud environments, enjoy solving complex reliability challenges, and prefer automating everything possible. The right person will combine deep technical expertise with strong operational discipline, helping build a world-class cloud platform supporting real industrial environments., Cloud Operations & Reliability

  • Maintain and optimize production, staging, and development environments running in Kubernetes on AWS
  • Implement and manage monitoring, logging, alerting, and observability frameworks
  • Lead incident response efforts and drive post-incident reviews focused on continuous improvement
  • Own backup, disaster recovery, and business continuity processes
  • Perform system capacity planning and performance tuning

Automation & Infrastructure Management

  • Build and maintain Infrastructure-as-Code using tools such as Terraform or Pulumi
  • Automate provisioning, configuration management, and environment lifecycle processes
  • Identify and eliminate operational inefficiencies through automation
  • Manage secrets, environment configuration, and version control across infrastructure environments

Security & Compliance

  • Implement and maintain least-privilege access models and cloud security guardrails
  • Support vulnerability management, patching workflows, and dependency maintenance
  • Assist with compliance readiness efforts including SOC 2, ISO 27001, or similar frameworks
  • Ensure proper logging, retention, and audit practices across cloud environments

FinOps / Cost Optimization

  • Monitor and optimize cloud spend across services and environments
  • Implement tagging standards, budget alerts, and cost visibility frameworks
  • Recommend architectural improvements to balance performance and cost efficiency

Collaboration & Leadership

  • Partner closely with engineering teams to improve reliability, deployment pipelines, and system architecture
  • Mentor engineers on operational best practices and cloud platform management
  • Develop runbooks, documentation, and operational standards
  • Champion reliability engineering principles, operational maturity, and risk reduction practices

Requirements

Candidates should be comfortable working in modern cloud-native environments and familiar with:

  • Kubernetes clusters, autoscaling, Helm charts, and service mesh concepts
  • AWS cloud services including compute, networking, storage, and cost management
  • Infrastructure-as-Code frameworks such as Terraform
  • Observability platforms such as Datadog, CloudWatch, Prometheus, or New Relic
  • CI/CD tools such as GitHub Actions, Bitbucket Pipelines, or Bamboo
  • Linux systems administration and troubleshooting
  • SRE practices including SLIs, SLOs, MTTR, RTO/RPO, and incident management

Apply for this position