Senior Site Reliability Engineer (contract)

Wells Fargo

Charlotte, United States of America

27 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Charlotte, United States of America

Tech stack

API

Application Release Automation

Build Automation

Azure

Bash

Cloud Computing

Cloud Engineering

Continuous Delivery

Continuous Integration

Data Centers

Linux

DevOps

Github

Monitoring of Systems

Python

Openshift

Powershell

Reliability Engineering

Ansible

Prometheus

Software Deployment

Data Logging

Cloud Platform System

System Availability

Delivery Pipeline

Grafana

HybridCloud

Containerization

Kubernetes

Deployment Automation

Performance Monitor

Terraform

Splunk

Appdynamics

Devsecops

Jenkins

Job description

We are seeking an experienced Platform Reliability / SRE Engineer to ensure the reliability, performance, and smooth operation of our enterprise Harness Continuous Delivery (CD) platform. This role is hands-on, automation-focused, and central to supporting our development teams across multiple environments., Platform Reliability & Operations

Ensure end-to-end reliability, availability, and performance of the Harness CD platform across non-prod, prod, and BCP environments
Monitor and report on SLIs, SLOs, error budgets, deployment success rates, and platform health
Lead incident response and troubleshooting for deployment failures, outages, or performance issues
Identify and resolve scaling, performance, and capacity challenges across delegates, pipelines, Kubernetes clusters, and cloud integrations

Automation & Engineering Excellence

Build automation for provisioning, configuration, scaling, upgrades, and ongoing maintenance of Harness components
Develop Infrastructure as Code (IaC) using Terraform, Ansible, Helm, or similar tools
Automate operational tasks including delegate lifecycle management, cluster onboarding, secret rotation, and pipeline validation
Reduce manual work by creating repeatable, self-service automation workflows

DevOps & CI/CD Integration

Maintain and improve integrations between Harness and tools such as GitHub, Jenkins, Azure DevOps, Kubernetes/OpenShift, and cloud platforms
Enhance developer experience by supporting efficient, reliable deployment pipelines
Partner with DevOps teams on deployment strategies (blue/green, canary, rolling updates)
Work with Security teams to embed DevSecOps practices, including policy enforcement and governance pipelines

Observability & Monitoring

Build and maintain monitoring, logging, dashboards, and alerting for all Harness components
Use tools such as Splunk, Prometheus, Grafana, or AppDynamics to create actionable alerts
Detect and escalate issues such as pipeline delays, delegate saturation, API errors, and Kubernetes resource constraints
Support proactive monitoring to reduce detection and resolution time

Modernization & Continuous Improvement

Assist with Harness upgrades, patches, and lifecycle maintenance
Support modernization initiatives such as containerization, cloud-native deployments, and multi-cloud expansion
Assist with resiliency activities including BCP testing and backup verification
Evaluate new Harness features and modules for enterprise adoption

Technical Leadership

Serve as a technical SME for the Harness platform
Create documentation, architecture details, and operational runbooks
Partner with senior engineers to enhance automation standards and platform best practices

Requirements

Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship.
Demonstrated experience in DevOps, SRE, Platform Engineering, or Cloud Engineering
Demonstrated hands-on experience with Harness CD
Strong experience with Kubernetes/OpenShift, Linux, and cloud deployment best practices
Solid understanding of CI/CD workflows and release automation
Experience applying SRE concepts (SLIs, SLOs, error budgets, reliability improvements)
Strong scripting and automation skills using Python, Bash, PowerShell, and Ansible
Experience with Infrastructure as Code (Terraform, Ansible, Helm, or similar)
Experience with monitoring and logging tools such as Prometheus, Grafana, Splunk, ELK, or AppDynamics
Strong troubleshooting skills across containers, OS, networking, platforms, and cloud environments
Data center migration experience (preferred)
Experience supporting enterprise-scale CD platforms (preferred)
Experience in hybrid cloud or cloud-native environments (Azure, GCP) (preferred)
Familiarity with DevSecOps, governance models, and policy automation (preferred)
Experience supporting complex upgrades, migrations, or modernization projects (preferred)