Principal Engineer - Platform Engineering & Production Support (contract)

Wells Fargo

Irving, United States of America

1 month ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Irving, United States of America

Tech stack

Java

Apache HTTP Server

Relational Databases

DevOps

Distributed Systems

Openshift

Red Hat Enterprise Linux - RHEL

Reliability Engineering

Site Reliability Engineering Practices

Prometheus

Cloud Platform System

React

Grafana

Spring-boot

Mttr

Reliability of Systems

Kubernetes

Kafka

Splunk

Appdynamics

ServiceNow

Microservices

Job description

The ideal candidate is a seasoned DevOps and Site Reliability Engineering (SRE) professional with strong hands-on expertise in observability, incident management, and cloud platforms (OpenShift). This role will play a leading part in supporting production systems, preventing outages, and improving system reliability through automation, intelligent monitoring, and modern SRE practices., * Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution

Design, build, and maintain advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, Prometheus, and SPLOC
Proactively identify production risks through gap analysis, anomaly detection, and predictive alerting, preventing incidents before they occur
Troubleshoot complex production issues across distributed microservices environments, driving reduced MTTR through deep technical expertise
Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring
Support applications running on OpenShift and cloud-native platforms, with a strong focus on reliability, scalability, and resiliency
Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
Participate in a 24x7 on-call rotation, demonstrating urgency, ownership, and accountability during incidents
Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering
Act as a trusted technical leader, able to rapidly shift priorities and manage competing demands in high-pressure environments

Requirements

Do you have experience in Distributed computing?, We are seeking a Principal Engineer within the Platform Engineering team. This individual must be Day 1 ready, comfortable operating in fast-paced, production-critical environments, and capable of balancing multiple competing priorities., * Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship.

Strong background in platform engineering and production support
Hands-on experience with:

Red Hat Linux
OpenShift and Kubernetes
Java and Python
Microservices architectures and Spring Boot

Experience designing and maintaining observability dashboards, including:

Grafana
Splunk
SPLOC
AppDynamics

Experience with observability alerts, incident response, and on-call support, leveraging tools such as:

AIOps platforms
ServiceNow
BigPanda or similar incident management tools

Experience with:

React.js
Apache
Kafka
Relational databases

Strong understanding of distributed systems, cloud-native platforms, and microservices-based architectures

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all