Principal Engineer - Platform Engineering & Production Support (contract)

Wells Fargo
Irving, United States of America
12 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
Irving, United States of America

Tech stack

Java
Apache HTTP Server
Relational Databases
DevOps
Distributed Systems
Openshift
Red Hat Enterprise Linux - RHEL
Reliability Engineering
Site Reliability Engineering Practices
Prometheus
Cloud Platform System
React
Grafana
Spring-boot
Mttr
Reliability of Systems
Kubernetes
Kafka
Splunk
Appdynamics
ServiceNow
Microservices

Job description

The ideal candidate is a seasoned DevOps and Site Reliability Engineering (SRE) professional with strong hands-on expertise in observability, incident management, and cloud platforms (OpenShift). This role will play a leading part in supporting production systems, preventing outages, and improving system reliability through automation, intelligent monitoring, and modern SRE practices., * Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution

  • Design, build, and maintain advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, Prometheus, and SPLOC
  • Proactively identify production risks through gap analysis, anomaly detection, and predictive alerting, preventing incidents before they occur
  • Troubleshoot complex production issues across distributed microservices environments, driving reduced MTTR through deep technical expertise
  • Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring
  • Support applications running on OpenShift and cloud-native platforms, with a strong focus on reliability, scalability, and resiliency
  • Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
  • Participate in a 24x7 on-call rotation, demonstrating urgency, ownership, and accountability during incidents
  • Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering
  • Act as a trusted technical leader, able to rapidly shift priorities and manage competing demands in high-pressure environments

Requirements

Do you have experience in Distributed computing?, We are seeking a Principal Engineer within the Platform Engineering team. This individual must be Day 1 ready, comfortable operating in fast-paced, production-critical environments, and capable of balancing multiple competing priorities., * Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship.

  • Strong background in platform engineering and production support
  • Hands-on experience with:
  • Red Hat Linux
  • OpenShift and Kubernetes
  • Java and Python
  • Microservices architectures and Spring Boot
  • Experience designing and maintaining observability dashboards, including:
  • Grafana
  • Splunk
  • SPLOC
  • AppDynamics
  • Experience with observability alerts, incident response, and on-call support, leveraging tools such as:
  • AIOps platforms
  • ServiceNow
  • BigPanda or similar incident management tools
  • Experience with:
  • React.js
  • Apache
  • Kafka
  • Relational databases
  • Strong understanding of distributed systems, cloud-native platforms, and microservices-based architectures

Apply for this position