Senior Software Engineer / SRE - Electronic Trading

Bloomberg

Charing Cross, United Kingdom

15 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)

Big Data

C++

Distributed Systems

Monitoring of Systems

Python

Reliability Engineering

Site Reliability Engineering Practices

Software Engineering

Diagnostic Tools

GitHub Copilot

Grafana

Spark

Reliability of Systems

Generative AI

Information Technology

Dynatrace

Job description

Senior Software Engineers - SRE in Electronic Trading (ET) ensure our global enterprise products spanning fixed income, equities, and derivatives are resilient and observable. This role focuses on building culture and platforms of observability and resilience to prevent market disruptions for global traders. We specialize in proactive anomaly detection, providing advanced performance insights and best practice guidance. Our team collaborates with application developers to define meaningful SLOs, implement chaos engineering, and build diagnostic tools that mitigate architectural risks as our platforms scale.

What's in it for you?

You will have the autonomy to drive reliability initiatives end-to-end, influencing the reliability strategy for critical global trading systems. By championing modern SRE practices and automation, you will fundamentally transform how we manage system stability. In your day-to-day, you'll develop frameworks for tracking reliability metrics, collaborate on system health reports, and build libraries that standardize alerting and incident response. You will also use failure injection and chaos testing to validate system performance under real-world stress. Our teams primarily build software using Python We'll trust you to:

Define and promote standards for observability, alerting, and incident response.
Develop self-maintaining tools using statistical analysis, health metrics, and distributed tracing.
Embed resiliency best practices into the full software development lifecycle.
Lead initiatives to mitigate risks related to performance, capacity, and scale.
Translate technical findings into actionable insights for engineers and stakeholders.
Automate operational tasks to enhance the safety and scalability of our infrastructure.

Requirements

Professional experience with Python or C++.
Strong collaboration and communication skills.
An understanding of distributed systems and system reliability.
Familiarity with SLOs, SLIs, and SLAs.
A degree in Computer Science, Engineering, or equivalent practical experience.

We'd love to see:

Experience in an SRE, Reliability or Production Engineering role.
Deep knowledge of system health assessment and building effective alerting.
Hands-on experience with monitoring tools (e.g., Grafana, Humio) and chaos engineering.
Familiarity with leveraging Generative AI (e.g., GitHub Copilot, Gemini) to accelerate development.
Experience with big data technologies like Apache Spark or Amazon S3.