Platform Engineering & Production Support
Role details
Job location
Tech stack
Job description
We are seeking a Principal Engineer for a platform engineering team. This role is responsible for stabilizing, scaling, and operating applications as they approach production release. The position requires a professional with a strong background in DevOps and Site Reliability Engineering (SRE), with expertise in observability, incident management, and cloud platforms. The individual must be prepared to operate in a fast-paced, production-critical environment., * Lead production support efforts for a portfolio of over 20 applications to ensure stability and performance.
- Design and build monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, and Prometheus.
- Identify risks through gap analysis, anomaly detection, and predictive alerting to prevent production incidents.
- Troubleshoot complex production issues across distributed microservices environments.
- Drive the adoption of modern SRE practices, including automation and intelligent monitoring solutions.
- Support applications running on OpenShift and cloud-native platforms, focusing on reliability and scalability.
- Collaborate with development teams during release cycles to provide production-readiness guidance.
- Participate in a 24x7 on-call rotation to address incidents.
- Mentor engineers to elevate team capabilities in SRE, DevOps, and platform engineering.
Requirements
Experience: 10+ years of experience in platform engineering and production support.
Technical Skills:
- 5+ years with Red Hat Linux, OpenShift, Kubernetes, Java, microservices, Spring Boot, and Python.
- 5+ years of experience creating observability dashboards with Grafana, Splunk, and AppDynamics.
- 5+ years of experience with observability alerts and incident handling, including AIOps, ServiceNow, or BigPanda.
- 4+ years with React.js, Apache Kafka, and relational databases.
- 4+ years with distributed systems, microservices architectures, and cloud-native platforms.
Preferred Qualifications
- Experience in the financial services industry.
- A background in development, particularly within Java-based ecosystems.
- Experience with AIOps tools like ServiceNow or BigPanda.
- Familiarity with Kafka and React.js.