Lead Infrastructure Engineer (SRE)

Wells Fargo
Concord, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior
Compensation
$ 224K

Job location

Concord, United States of America

Tech stack

JavaScript
Vbscript
Agile Methodologies
Data analysis
Application Performance Management
Systems Engineering
Confluence
JIRA
Bash
Databases
Continuous Integration
Data Integration
Noise Reduction
Distributed Systems
JMeter
Python
Powershell
Reliability Engineering
Site Reliability Engineering Practices
Ansible
Prometheus
Systems Integration
Data Logging
Scripting (Bash/Python/Go/Ruby)
Grafana
Reliability of Systems
Containerization
Blazemeter
Kubernetes
Infrastructure Automation Frameworks
ArcSight Event Correlation
Splunk
Appdynamics
Dynatrace
Docker
Programming Languages

Job description

As a Lead SRE, you will be part of a high-impact team responsible for advancing and embedding SRE practices across multiple applications and critical customer journeys within the Banking Operations platform. You will play a central role in transforming how reliability, scalability, and observability are engineered and sustained-helping to shape a modern, resilient, and data-driven technology ecosystem.

This team is at the forefront of driving technology transformation across the enterprise by adopting SRE-aligned capabilities, launching new tooling, automating complex operational challenges, and integrating with modern platforms and pipelines. Leveraging your background in software and systems engineering, you will ensure that onboarded applications are highly available, resilient, and fully instrumented with end-to-end observability.

In this role, you will lead the adoption and evolution of observability practices-including metrics, logging, tracing, and telemetry-while promoting operational excellence through code, automation, and continuous improvement. You will introduce and scale data-driven insights, enabling smarter decision-making and proactive issue resolution across the ecosystem.

You will also partner closely with application and platform engineering teams to ensure services are reliable, measurable, and continuously improving. Your work will include building and enhancing CI/CD integration, validating system reliability through rigorous testing, and driving the modernization of operational practices across the organization.

In this role, you will:

  • Drive and lead Site Reliability Engineering capabilities at Wells Fargo Banking Operations igniting the practice, principles, and culture, leading by example. Mentor and coach engineers while scaling the SRE practice within Banking Operations and partnering with peer platform embedded SRE teams
  • Leverage enterprise capabilities, tools, and innovation to improve availability in a complex ecosystem by maturing observability practices including monitoring, logging, distributed tracing, synthetic monitoring, and chaos engineering with a focus on actionable insights and proactive detection
  • Lead the evolution of our environment introducing self-healing and autonomic capabilities, solving complex operational and systemic issues with precision including building and training models, automating cognitive processes, and leveraging telemetry to improve availability and reliability of products we provide to customers
  • Own and automate key SRE metrics and IT Service Operations processes including customer impact, golden signals and critical user journeys, % availability of critical business flows, SLO/SLI definition and adherence, error budget management, and real-time observability dashboards; automate incident response processes through data integration with unified communications and alerting/notification systems
  • Provide leadership in support responsibilities for critical applications and customer journeys onboarded to SRE including rapid remediation of issues through Agile practices, conducting blameless post mortems, driving root cause analysis, and implementing durable solutions through continuous improvement with the goal of eliminating repeat incidents

Requirements

  • 5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of experience using Observability Tools with hands-on implementation of monitoring, logging, or tracing solutions utilizing Grafana, ThousandEyes, Prometheus, AppDynamics, or Splunk
  • 3+ years of application production support experience in complex, high-availability environments
  • 2+ years of experience with Confluence or Jira, * Experienced with Site Reliability Engineering (SRE) including SLO/SLI frameworks, error budgets, toil reduction, and production reliability engineering practices
  • Experience with database logging and monitoring concepts experience
  • Experience with Application performance monitoring and optimization using BlazeMeter, JMeter, Splunk, AppDynamics, or similar observability platforms
  • Experience with scripting or programming languages such as Bash, PowerShell, Python, Shell, VBScript, or JavaScript for automation and reliability engineering use cases
  • Experience and understanding of AIOps and related tools such as MoogSoft or Big Panda, including event correlation and noise reduction
  • Experience with one or more automation tools such as Ansible or similar infrastructure-as-code/configuration management tools
  • Experience with Container technologies: Kubernetes, Docker, PKS, with focus on observability and reliability patterns in distributed systems

Benefits & conditions

Wells Fargo provides eligible employees with a comprehensive set of benefits, many of which are listed below. Visit Benefits - Wells Fargo Jobs (https://www.wellsfargojobs.com/en/life-at-wells-fargo/benefits) for an overview of the following benefit plans and programs offered to employees.

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Posting End Date:

28 May 2026

***** Job posting may come down early due to volume of applicants.

We Value Equal Opportunity

Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.

Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit's risk appetite and all risk and compliance program requirements.

About the company

Wells Fargo maintains a drug free workplace. Please see our Drug and Alcohol Policy (https://www.wellsfargojobs.com/en/wells-fargo-drug-and-alcohol-policy) to learn more. Wells Fargo Recruitment and Hiring Requirements: a. Third-Party recordings are prohibited unless authorized by Wells Fargo. b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.

Apply for this position