SRE Observability Engineer - London

Citigroup Inc.
Charing Cross, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Charing Cross, United Kingdom

Tech stack

Data analysis
Bash
Monitoring of Systems
Openshift
Prometheus
Software Deployment
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Lightspeed
Grafana
Kubernetes Helm Charts
Backend
Kubernetes
User Administration

Job description

The Monitoring and Observability team is responsible for managing:

  • Operating with a global footprint.
  • Collaborating across various organizations within Citi to understand and develop observability solutions for enterprise-wide deployment at scale.
  • Managing the legacy monitoring stack across the Production Management organization within Citi.
  • Driving the strategic delivery of end-to-end Observability solutions in Citi.
  • Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions.
  • Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services.
  • Persuading and influencing others through strong and comprehensive communication and diplomacy skills.
  • Performing other duties and functions as assigned.

Requirements

  • OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking.
  • Grafana & Observability Stack:
  • Proficiency in administering Geneos ITRS at scale.
  • Proficiency in administering Grafana (user management, data sources, dashboards, alerts).
  • Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces).
  • Experience with Prometheus for metric collection and PromQL for querying.
  • Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies.
  • Technical Documentation: Ability to create clear and concise documentation for systems and processes.

Desired Skills:

  • Application Deployment: Ability to deploy applications using Lightspeed Enterprise.
  • Google Cloud Operations: Experience with Google Cloud operations.
  • Scripting & Automation: Experience with Bash or Python scripting for automating operational tasks.

Apply for this position