Senior SRE - Data & Middleware Observability & Incident Reduction Vice President

Citigroup, Inc.

Irving, United States of America

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 189K

Job location

Irving, United States of America

Tech stack

Data analysis

Tomcat

Server Applications

Cloud Computing

Cluster Analysis

Databases

Computer Engineering

Data Governance

Database Connection

Query Languages

Middleware

IBM Websphere Application Server

Python

Log Analysis

Enterprise Messaging Systems

Microsoft SQL Server

MongoDB

NoSQL

Oracle Applications

Pattern Recognition

Performance Tuning

Enterprise Data Management

Scripting (Bash/Python/Go/Ruby)

Enterprise Software Applications

Data Layers

Information Technology

Kafka

Dynatrace

Job description

The Senior Incident Operations & Optimization Specialist for Data & Middleware is a specialized technical leadership role requiring deep expertise in database technologies, messaging platforms, and application middleware. This position is essential to the Incident Reduction Program, as database and Middleware systems generate significant operational incidents while serving as critical infrastructure for enterprise applications.

You will be responsible for building automated incident remediation workflows and achieving measurable incident reduction through intelligent correlation, threshold optimization, and automation while ensuring the health and performance of business-critical data and Middleware platforms remain visible and protected. This role offers the opportunity to modernize observability and event management for the data layer and integration tier of enterprise architecture., * Incident & Alert Analysis: Analyze and optimize monitoring across all database and Middleware platforms to address high-volume, low-value alerts, identify patterns in incident generation, and determine root causes.

Intelligent Event Management: Develop and implement domain-specific correlation, de-duplication, and suppression rules on AIOps and event management platforms. Create logic that understands database cluster relationships, messaging dependencies, and application-to-database connections.
Automation & Self-Healing: Architect and develop automation playbooks for incident data enrichment and automated remediation of common database and Middleware issues, such as connection pool resets or service restarts.
Observability Enhancement: Identify monitoring gaps across the data and Middleware landscape, proposing enhancements to ensure comprehensive health monitoring and address blind spots in transactional flows.
Cross-Functional Collaboration: Partner closely with Database Administration (DBA), Middleware engineering, and application teams to validate correlation logic, build consensus on threshold changes, and provide expert guidance on event management best practices.
Quality Assurance: Continuously validate the effectiveness of implemented rules and automation, ensuring critical health indicators remain highly visible. Lead post-implementation reviews and drive iterative improvements.

Requirements

Experience: A minimum of 8+ years of hands-on experience in database administration, Middleware engineering, or enterprise data platform operations.
Event Management & Incident Reduction: Proven experience in event management, alert tuning, and incident reduction for data and Middleware services, with measurable results. Direct, hands-on experience with modern AIOps and event management platforms is required.
Technical Expertise:

Deep knowledge of both relational (eg, Oracle, SQL Server) and NoSQL (eg, MongoDB) database technologies, including clustering, replication, and performance tuning.
Expertise in Middleware platforms, including messaging technologies (eg, MQ, Kafka) and application Servers (eg, WebSphere, Tomcat).

Automation & Scripting: Hands-on experience developing robust automation solutions using relevant Scripting languages (eg, Python, Shell) and modern automation frameworks.
Data Analysis: Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms.
Problem-Solving & Analytical Skills: Excellent analytical abilities with a systematic approach to troubleshooting complex data platform architectures and correlating infrastructure issues with application impact.
Communication & Leadership: Exceptional communication skills with the ability to collaborate effectively with DBAs, Middleware engineers, and application teams, and to present technical concepts to diverse audiences.

Preferred Qualifications

An advanced degree (Master's) in a relevant technical field.
Relevant industry certifications (eg, Database, Middleware, Cloud, Automation, ITIL).
Experience with Database as a Service (DBaaS) platforms and other database technologies.
Knowledge of data governance, security, and compliance requirements in a regulated environment.
Background in large-scale financial services environments.
Experience with modern observability platforms, distributed tracing, and infrastructure-as-code (IaC) principles.

Education

Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field.

Benefits & conditions

Primary Location Full Time Salary Range: $125,760.00 - $188,640.00

In addition to salary, Citi's offerings may also include, for eligible employees, discretionary and formulaic incentive and retention awards. Citi offers competitive employee benefits, including: medical, dental & vision coverage; 401(k); life, accident, and disability insurance; and wellness programs. Citi also offers paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all