Senior SRE - Data & Middleware Observability & Incident Reduction Vice President

Citigroup, Inc.
Irving, United States of America
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 189K

Job location

Irving, United States of America

Tech stack

Data analysis
Tomcat
Server Applications
Cloud Computing
Cluster Analysis
Databases
Computer Engineering
Data Governance
Database Connection
Query Languages
Middleware
IBM Websphere Application Server
Python
Log Analysis
Enterprise Messaging Systems
Microsoft SQL Server
MongoDB
NoSQL
Oracle Applications
Pattern Recognition
Performance Tuning
Enterprise Data Management
Scripting (Bash/Python/Go/Ruby)
Enterprise Software Applications
Data Layers
Information Technology
Kafka
Dynatrace

Job description

The Senior Incident Operations & Optimization Specialist for Data & Middleware is a specialized technical leadership role requiring deep expertise in database technologies, messaging platforms, and application middleware. This position is essential to the Incident Reduction Program, as database and Middleware systems generate significant operational incidents while serving as critical infrastructure for enterprise applications.

You will be responsible for building automated incident remediation workflows and achieving measurable incident reduction through intelligent correlation, threshold optimization, and automation while ensuring the health and performance of business-critical data and Middleware platforms remain visible and protected. This role offers the opportunity to modernize observability and event management for the data layer and integration tier of enterprise architecture., * Incident & Alert Analysis: Analyze and optimize monitoring across all database and Middleware platforms to address high-volume, low-value alerts, identify patterns in incident generation, and determine root causes.

  • Intelligent Event Management: Develop and implement domain-specific correlation, de-duplication, and suppression rules on AIOps and event management platforms. Create logic that understands database cluster relationships, messaging dependencies, and application-to-database connections.
  • Automation & Self-Healing: Architect and develop automation playbooks for incident data enrichment and automated remediation of common database and Middleware issues, such as connection pool resets or service restarts.
  • Observability Enhancement: Identify monitoring gaps across the data and Middleware landscape, proposing enhancements to ensure comprehensive health monitoring and address blind spots in transactional flows.
  • Cross-Functional Collaboration: Partner closely with Database Administration (DBA), Middleware engineering, and application teams to validate correlation logic, build consensus on threshold changes, and provide expert guidance on event management best practices.
  • Quality Assurance: Continuously validate the effectiveness of implemented rules and automation, ensuring critical health indicators remain highly visible. Lead post-implementation reviews and drive iterative improvements.

Requirements

  • Experience: A minimum of 8+ years of hands-on experience in database administration, Middleware engineering, or enterprise data platform operations.
  • Event Management & Incident Reduction: Proven experience in event management, alert tuning, and incident reduction for data and Middleware services, with measurable results. Direct, hands-on experience with modern AIOps and event management platforms is required.
  • Technical Expertise:
  • Deep knowledge of both relational (eg, Oracle, SQL Server) and NoSQL (eg, MongoDB) database technologies, including clustering, replication, and performance tuning.
  • Expertise in Middleware platforms, including messaging technologies (eg, MQ, Kafka) and application Servers (eg, WebSphere, Tomcat).
  • Automation & Scripting: Hands-on experience developing robust automation solutions using relevant Scripting languages (eg, Python, Shell) and modern automation frameworks.
  • Data Analysis: Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms.
  • Problem-Solving & Analytical Skills: Excellent analytical abilities with a systematic approach to troubleshooting complex data platform architectures and correlating infrastructure issues with application impact.
  • Communication & Leadership: Exceptional communication skills with the ability to collaborate effectively with DBAs, Middleware engineers, and application teams, and to present technical concepts to diverse audiences.

Preferred Qualifications

  • An advanced degree (Master's) in a relevant technical field.
  • Relevant industry certifications (eg, Database, Middleware, Cloud, Automation, ITIL).
  • Experience with Database as a Service (DBaaS) platforms and other database technologies.
  • Knowledge of data governance, security, and compliance requirements in a regulated environment.
  • Background in large-scale financial services environments.
  • Experience with modern observability platforms, distributed tracing, and infrastructure-as-code (IaC) principles.

Education

  • Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field.

Benefits & conditions

Primary Location Full Time Salary Range: $125,760.00 - $188,640.00

In addition to salary, Citi's offerings may also include, for eligible employees, discretionary and formulaic incentive and retention awards. Citi offers competitive employee benefits, including: medical, dental & vision coverage; 401(k); life, accident, and disability insurance; and wellness programs. Citi also offers paid time off packages, including planned time off (vacation), unplanned time off (sick leave), and paid holidays.

Apply for this position