Site Reliability Engineer (SRE) / Application Production Support

Carlin Shayn

Alpharetta, United States of America

2 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Alpharetta, United States of America

Tech stack

Java

JIRA

CA Workload Automation Ae

Batch Processing

Cloud Computing

IBM DB2

Monitoring of Systems

Spring

Python

Shell

Log Analysis

Oracle Applications

Reliability Engineering

Standard Sql

Shell Script

SQL Databases

Enterprise Software Applications

System Availability

Grafana

Spring-boot

SAP Sybase ASE

Software Troubleshooting

Kibana

Splunk

Pagerduty

ServiceNow

Control M

Job description

Morgan Stanley is seeking an experienced Site Reliability Engineer (SRE) / Application Production Support professional to support mission-critical financial applications. The ideal candidate will have strong expertise in production support, incident management, monitoring, troubleshooting, automation, and SRE best practices. This role requires hands-on experience with application support, log analysis, scheduling tools, and cloud technologies within enterprise environments., * Monitor and support production applications to ensure high availability and reliability.

Investigate, troubleshoot, and resolve application, infrastructure, and batch processing issues.
Analyze application logs and identify root causes for incidents and service disruptions.
Perform log analysis using Splunk, Kibana, and related monitoring tools.
Manage and support batch processing using scheduling tools such as Control-M and AutoSys.
Handle incidents, service requests, and problem management activities using ITSM processes.
Work with ticketing platforms such as ServiceNow, Jira, or Remedy.
Collaborate with development, infrastructure, and business teams to ensure timely issue resolution.
Participate in on-call support and incident response activities.
Develop automation scripts using Unix Shell or Python to improve operational efficiency.
Support observability, monitoring, and alert management initiatives., * Splunk
Kibana
SQL
DB2
Java
Unix/Linux
Python
Control-M
AutoSys
ServiceNow
Jira
Grafana
Loki
PagerDuty
BigPanda
Cloud Technologies

Requirements

Strong understanding of Site Reliability Engineering (SRE) principles.
Experience with Production Support in enterprise environments.
Experience in Incident Management, ITIL, and ITSM processes.
Hands-on experience with:

Splunk
Kibana
SQL
DB2
Java
Unix/Linux
Python or Shell Scripting

Experience with scheduling tools:

Control-M
AutoSys

Experience using ticketing tools:

ServiceNow
Jira
Remedy

Knowledge of Cloud Fundamentals.
Strong troubleshooting and root cause analysis skills.
Excellent communication and stakeholder management abilities.

Preferred Skills

5+ years of experience supporting enterprise applications.
Experience with:

Grafana
Loki
Oracle
Sybase

Knowledge of alert management platforms:

BigPanda
PagerDuty

Familiarity with:

Core Java
Spring Framework
Spring Boot
Spring Integration

Exposure to Synthetic Monitoring tools such as Apica.
Experience supporting applications within the Financial Services domain., * 5+ years of Production Support, SRE, or Application Support experience.
Experience supporting mission-critical enterprise applications.
Financial Services or Banking domain experience preferred.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all