Senior Cloud Systems Engineer in Arvada
Role details
Job location
Tech stack
Job description
-
Own and manage Stargate production releases and deployment pipelines usingGitOpspractices
-
Drive operational excellence initiatives including metrics collection, log aggregation, uptime monitoring, KPI tracking, and SIMIntegration
-
Maintain and achieve 99.99% (four nines) to 99.999% (five nines) uptime SLAs
-
Design, develop, andmaintainHelm charts for Stargate and related infrastructure components
-
Implement and manage progressive deployment strategies including canary deployments and blue-green deployments
-
Oversee critical Kubernetes infrastructure including volume management, DNS configuration, load balancer provisioning, and secretmonitoring/management
-
Manage andoptimizeKubernetes deployments and related AWS services
-
Implement andmaintainobservability stack usingOpenTelemetryfor comprehensive monitoring and alerting
-
Collaborate with engineering teams toestablishand enforce operational best practices and reliability standards
Requirements
-
5+ years of production DevOps/SRE experience with demonstrabletrack recordofmaintaininghigh-availability systems
-
Kubernetes administration experience with elevated cluster access in production environments
-
Strongproficiencywriting andmaintainingHelm charts for complex, multi-componentapplications
-
Hands-on experience implementing canary deployments, blue-green deployments, and other progressive delivery patterns
-
Deep knowledge of Kubernetes infrastructure management: persistent volumes, DNS/networking, load balancers, and secrets management
-
Production experience withGitOpsworkflows and Flux CD
-
Proventrack recordmaintaining99.99%+ uptime in production environments
-
Excellent judgment and decision-making skills when working with production systems
Qualifications:
-
Experience with AWS cloud services, particularly EKS (Elastic Kubernetes Service), Secrets Manager, VPC networking, IAM, and AWS Load Balancers
-
Experience with Karpenter for Kubernetes node autoscaling and cluster optimization
-
Experience withOpenTelemetryinstrumentation and observability platforms
-
Kubernetes certifications (CKA, CKAD, or CKS)
-
Experience building andmaintainingCI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, etc.)
-
Knowledge of infrastructure-as-code tools (Terraform,CDK)
-
Experience implementing SRE practices including SLIs, SLOs, and error budgets
Benefits & conditions
Compensation & Benefits: Compensation level and base salary are competitively structured and thoughtfully determined based on factors such as relevant skills, experience, education, and the scope of the role.
- Comprehensive health coverage: Medical, dental, and vision benefits, with 70% of premiums covered by the employer
- Paid time off: Three (3) weeks per year of vacation
- Retirement plan: Up to 4% employer match on 401(k) contributions
- Paid holidays: 11 company-recognized holidays
- Parental leave
- Educational reimbursement opportunities to support company objectives, continued learning, and career development