Senior Infrastructure Engineer - SRE/Platform...

Wells Fargo
Irving, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Shift work
Languages
English
Experience level
Senior

Job location

Irving, United States of America

Tech stack

Adobe InDesign
API
Agile Methodologies
Azure
Cloud Computing
Cloud Computing Security
Cloud Engineering
Continuous Integration
IBM WebSphere DataPower SOA Appliances
Software Debugging
Key Management
Openshift
Scrum
Red Hat Enterprise Linux - RHEL
Reliability Engineering
Ansible
Prometheus
Runbook
Management of Software Versions
Grafana
Mttr
Reliability of Systems
Apigee
Containerization
Git Flow
Kubernetes
Infrastructure Automation Frameworks
Performance Monitor
Enterprise Integration
Api Gateway
Terraform
Splunk
Code Restructuring
Ansible Tower
Appdynamics
Api Management

Job description

This role is focused on driving infrastructure stability, automation, and reliability across critical API and platform systems that support high-impact financial transactions. This is a highly visible role responsible for owning production reliability, improving operational efficiency, and enabling scalable platform capabilities across API Management, CI/CD, and supporting platform environments.

In this role, you will:

  • Lead daily support operations for Apigee OPDK, Apigee Hybrid, to ensure platform uptime, stability, and performance

  • Troubleshoot runtime, policy, routing, and security issues on DataPower appliances

  • Develop specifications for complex infrastructure systems, design and test solutions

  • Contribute to the testing of business, application and technical infrastructure requirements

  • Implement reliability improvements through Infrastructure-as-Code (IaC) using Terraform, Ansible, and GitOps

  • Develop automated recovery scripts and tools to reduce manual operational overhead

  • Review and analyze solutions for cloud security, secrets management and key rotations

  • Design, code, test, debug and document programs using Agile development practices

  • Plan and execute version upgrades, patching cycles, infrastructure migrations, and configuration refactoring.

  • Improve proactive alerting to reduce mean time to detect (MTTD) and mean time to recover (MTTR)

  • Own and resolve P1/P2 high-severity incidents with quick response and deep technical troubleshooting

  • Direct the daily risk and control flow of operations, focusing on policies, procedures and work standards to ensure success

  • Participate in design discussions, architectural reviews, API governance activities, and platform modernization initiatives

  • Work with CAB (Change Advisory Board) for change planning, approvals, and execution tracking

  • Contribute to runbooks, SOPs, architectural diagrams, and platform knowledge base assets, Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit's risk appetite and all risk and compliance program requirements.

Requirements

  • 4+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

  • 4+ years of Proficiency in leveraging observability platforms such as BigPanda, ThousandEyes, Grafana, Prometheus, ELK, Splunk Observability, and AppDynamics to enhance service reliability and performance monitoring

  • 3+ years of experience working with Red Hat Enterprise Linux and Kubernetes, with a strong focus on Red Hat OpenShift Container Platform (OCP)

  • 3+ years of experience with Site Reliability Engineering and supporting production grade

  • 3+ years of experience with automation & scripting

Desired Qualifications:

  • 4+ years of experience in IT Service Management (ITSM), with a strong background in incident, problem, and change management processes

  • Experience with API management platforms such as Apigee or API gateways

  • Exposure to IBM DataPower or similar enterprise integration tools

  • Expertise in Ansible Tower, including developing and maintaining playbooks

  • Experience with cloud-native architectures, high-availability systems, Cloud & Container Technologies like GCP or Azure and familiarity with Kubernetes

  • Strong experience working in Agile methodologies / Scrum environments

  • Experience improving system reliability, scalability, and operational efficiency

  • Experience in project management and stakeholder engagement

  • Proven experience in leading cross-functional teams

  • Strong problem-solving and decision-making abilities

  • Excellent communication and collaboration skills

About the company

Wells Fargo maintains a drug free workplace. Please see our Drug and Alcohol Policy (https://www.wellsfargojobs.com/en/wells-fargo-drug-and-alcohol-policy) to learn more.

Apply for this position