Sr Cloud Engineer

CompNova LLC
Oakland, United States of America
2 days ago

Role details

Contract type
Temporary to permanent
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Oakland, United States of America

Tech stack

Amazon Web Services (AWS)
Build Automation
Bash
Cloud Computing
Cloud Engineering
Code Generation
DevOps
Distributed Systems
Elasticsearch
Github
Monitoring of Systems
Python
Octopus Deploy
Query Optimization
Ansible
Prometheus
Cloud Collaboration
Cisco WebEx
Data Logging
Scripting (Bash/Python/Go/Ruby)
Grafana
Kubernetes
Information Technology
Kafka
Terraform
Splunk
Go
Microservices

Job description

The Grade 10 Cloud Engineer within the Customer s Cloud Collaboration Technology Group will play a key role in building and operating scalable observability and infrastructure platforms supporting Webex microservices. This role requires strong hands-on expertise in Kubernetes, cloud infrastructure, and observability systems, along with the ability to operate independently and to own components end-to-end in production environments. Candidates will demonstrate extensive use of generative AI tools for code generation and production system troubleshooting., Design, develop, and operate observability platforms to perform logging, metrics, and/or tracing for Webex microservices.

Manage and optimize Kubernetes clusters across multi-region environments.

Own CI/CD pipelines using Argo CD and Helm.

Implement Infrastructure as code (IaC) using Terraform on AWS.

Operate monitoring ecosystems, including but not limited to:

OpenSearch/ELK,

Prometheus,

Grafana,

Splunk, and

Kafka.

Build automation to detect and remediate production issues.

Ensure security compliance through vulnerability patching.

Collaborate cross-functionally to improve reliability.

Participate in on-call rotations and incident response.

Contribute to distributed system design and operations.

Requirements

General Abilities

Bachelor s degree in computer science or related field

General Technical Skills

At least eight (8) years of experience in a DevOps and/or SRE platform engineering role

Incident response and on-call operations: Demonstrated experience in a 24/7 production environment, including but not limited to:

Triaging alerts

Leading incident response

Writing post-incident reviews

Maintaining SLA commitments across large-scale distributed systems

IaC and automation: Proficiency with Terraform, Ansible, and/or equivalent IaC tooling for provisioning and managing cloud infrastructure at scale on AWS

Scripting and development: Working proficiency in Python, Golang, and/or Bash for building automation scripts, operational tooling, and/or CI/CD pipeline integrations (e.g., Drone, GitHub Actions, Argo CD)

Specific Technical Skills

Kubernetes and container orchestration: Production experience operating and troubleshooting workloads on Kubernetes at large scale (i.e., hundreds of deployments and thousands of pods), including but not limited to:

Helm chart management

Pod scheduling

Resource tuning

Multi-cluster operations

Observability stack expertise: Hands-on experience performing pipeline design, query optimization, and/or capacity planning for high-volume environments in at least two (2) of the following:

OpenSearch/Elasticsearch

Apply for this position