Senior Site Reliability Engineer (contract)
Wells Fargo
Charlotte, United States of America
27 days ago
Role details
Contract type
Temporary contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Remote
Charlotte, United States of America
Tech stack
API
Application Release Automation
Build Automation
Azure
Bash
Cloud Computing
Cloud Engineering
Continuous Delivery
Continuous Integration
Data Centers
Linux
DevOps
Github
Monitoring of Systems
Python
Openshift
Powershell
Reliability Engineering
Ansible
Prometheus
Software Deployment
Data Logging
Cloud Platform System
System Availability
Delivery Pipeline
Grafana
HybridCloud
Containerization
Kubernetes
Deployment Automation
Performance Monitor
Terraform
Splunk
Appdynamics
Devsecops
Jenkins
Job description
We are seeking an experienced Platform Reliability / SRE Engineer to ensure the reliability, performance, and smooth operation of our enterprise Harness Continuous Delivery (CD) platform. This role is hands-on, automation-focused, and central to supporting our development teams across multiple environments., Platform Reliability & Operations
- Ensure end-to-end reliability, availability, and performance of the Harness CD platform across non-prod, prod, and BCP environments
- Monitor and report on SLIs, SLOs, error budgets, deployment success rates, and platform health
- Lead incident response and troubleshooting for deployment failures, outages, or performance issues
- Identify and resolve scaling, performance, and capacity challenges across delegates, pipelines, Kubernetes clusters, and cloud integrations
Automation & Engineering Excellence
- Build automation for provisioning, configuration, scaling, upgrades, and ongoing maintenance of Harness components
- Develop Infrastructure as Code (IaC) using Terraform, Ansible, Helm, or similar tools
- Automate operational tasks including delegate lifecycle management, cluster onboarding, secret rotation, and pipeline validation
- Reduce manual work by creating repeatable, self-service automation workflows
DevOps & CI/CD Integration
- Maintain and improve integrations between Harness and tools such as GitHub, Jenkins, Azure DevOps, Kubernetes/OpenShift, and cloud platforms
- Enhance developer experience by supporting efficient, reliable deployment pipelines
- Partner with DevOps teams on deployment strategies (blue/green, canary, rolling updates)
- Work with Security teams to embed DevSecOps practices, including policy enforcement and governance pipelines
Observability & Monitoring
- Build and maintain monitoring, logging, dashboards, and alerting for all Harness components
- Use tools such as Splunk, Prometheus, Grafana, or AppDynamics to create actionable alerts
- Detect and escalate issues such as pipeline delays, delegate saturation, API errors, and Kubernetes resource constraints
- Support proactive monitoring to reduce detection and resolution time
Modernization & Continuous Improvement
- Assist with Harness upgrades, patches, and lifecycle maintenance
- Support modernization initiatives such as containerization, cloud-native deployments, and multi-cloud expansion
- Assist with resiliency activities including BCP testing and backup verification
- Evaluate new Harness features and modules for enterprise adoption
Technical Leadership
- Serve as a technical SME for the Harness platform
- Create documentation, architecture details, and operational runbooks
- Partner with senior engineers to enhance automation standards and platform best practices
Requirements
- Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship.
- Demonstrated experience in DevOps, SRE, Platform Engineering, or Cloud Engineering
- Demonstrated hands-on experience with Harness CD
- Strong experience with Kubernetes/OpenShift, Linux, and cloud deployment best practices
- Solid understanding of CI/CD workflows and release automation
- Experience applying SRE concepts (SLIs, SLOs, error budgets, reliability improvements)
- Strong scripting and automation skills using Python, Bash, PowerShell, and Ansible
- Experience with Infrastructure as Code (Terraform, Ansible, Helm, or similar)
- Experience with monitoring and logging tools such as Prometheus, Grafana, Splunk, ELK, or AppDynamics
- Strong troubleshooting skills across containers, OS, networking, platforms, and cloud environments
- Data center migration experience (preferred)
- Experience supporting enterprise-scale CD platforms (preferred)
- Experience in hybrid cloud or cloud-native environments (Azure, GCP) (preferred)
- Familiarity with DevSecOps, governance models, and policy automation (preferred)
- Experience supporting complex upgrades, migrations, or modernization projects (preferred)