Principal Engineer - Platform Engineering & Production Support (contract)
Role details
Job location
Tech stack
Job description
The ideal candidate is a seasoned DevOps and Site Reliability Engineering (SRE) professional with strong hands-on expertise in observability, incident management, and cloud platforms (OpenShift). This role will play a leading part in supporting production systems, preventing outages, and improving system reliability through automation, intelligent monitoring, and modern SRE practices., * Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution
- Design, build, and maintain advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, Prometheus, and SPLOC
- Proactively identify production risks through gap analysis, anomaly detection, and predictive alerting, preventing incidents before they occur
- Troubleshoot complex production issues across distributed microservices environments, driving reduced MTTR through deep technical expertise
- Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring
- Support applications running on OpenShift and cloud-native platforms, with a strong focus on reliability, scalability, and resiliency
- Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
- Participate in a 24x7 on-call rotation, demonstrating urgency, ownership, and accountability during incidents
- Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering
- Act as a trusted technical leader, able to rapidly shift priorities and manage competing demands in high-pressure environments
Requirements
Do you have experience in Distributed computing?, We are seeking a Principal Engineer within the Platform Engineering team. This individual must be Day 1 ready, comfortable operating in fast-paced, production-critical environments, and capable of balancing multiple competing priorities., * Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship.
- Strong background in platform engineering and production support
- Hands-on experience with:
- Red Hat Linux
- OpenShift and Kubernetes
- Java and Python
- Microservices architectures and Spring Boot
- Experience designing and maintaining observability dashboards, including:
- Grafana
- Splunk
- SPLOC
- AppDynamics
- Experience with observability alerts, incident response, and on-call support, leveraging tools such as:
- AIOps platforms
- ServiceNow
- BigPanda or similar incident management tools
- Experience with:
- React.js
- Apache
- Kafka
- Relational databases
- Strong understanding of distributed systems, cloud-native platforms, and microservices-based architectures