Senior Cloud Systems Engineer
Role details
Job location
Tech stack
Job description
-
Own and manage Stargate production releases and deployment pipelines using GitOps practices
-
Drive operational excellence initiatives including metrics collection, log aggregation, uptime monitoring, KPI tracking, and SIM Integration
-
Maintain and achieve 99.99% (four nines) to 99.999% (five nines) uptime SLAs
-
Design, develop, and maintain Helm charts for Stargate and related infrastructure components
-
Implement and manage progressive deployment strategies including canary deployments and blue-green deployments
-
Oversee critical Kubernetes infrastructure including volume management, DNS configuration, load balancer provisioning, and secret monitoring/management
-
Manage and optimize Kubernetes deployments and related AWS services
-
Implement and maintain observability stack using OpenTelemetry for comprehensive monitoring and alerting
-
Collaborate with engineering teams to establish and enforce operational best practices and reliability standards
Requirements
Do you have experience in SRE?, * 5+ years of production DevOps/SRE experience with demonstrable track record of maintaining high-availability systems
-
Kubernetes administration experience with elevated cluster access in production environments
-
Strong proficiency writing and maintaining Helm charts for complex, multi-component applications
-
Hands-on experience implementing canary deployments, blue-green deployments, and other progressive delivery patterns
-
Deep knowledge of Kubernetes infrastructure management: persistent volumes, DNS/networking, load balancers, and secrets management
-
Production experience with GitOps workflows and Flux CD
-
Proven track record maintaining 99.99%+ uptime in production environments
-
Excellent judgment and decision-making skills when working with production systems
Preferred Qualifications:
-
Experience with AWS cloud services, particularly EKS (Elastic Kubernetes Service), Secrets Manager, VPC networking, IAM, and AWS Load Balancers
-
Experience with Karpenter for Kubernetes node autoscaling and cluster optimization
-
Experience with OpenTelemetry instrumentation and observability platforms
-
Kubernetes certifications (CKA, CKAD, or CKS)
-
Experience building and maintaining CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, etc.)
-
Knowledge of infrastructure-as-code tools (Terraform, CDK)
-
Experience implementing SRE practices including SLIs, SLOs, and error budgets
Benefits & conditions
Pulled from the full job description
- 401(k) 4% Match
- Tuition reimbursement
- Parental leave
- Health insurance
- 401(k) matching
- Paid time off
- Vision insurance, Compensation & Benefits: Compensation level and base salary are competitively structured and thoughtfully determined based on factors such as relevant skills, experience, education, and the scope of the role.
- Comprehensive health coverage: Medical, dental, and vision benefits, with 70% of premiums covered by the employer
- Paid time off: Three (3) weeks per year of vacation
- Retirement plan: Up to 4% employer match on 401(k) contributions
- Paid holidays: 11 company-recognized holidays
- Parental leave
- Educational reimbursement opportunities to support company objectives, continued learning, and career development