Senior Software Engineer / SRE - Electronic Trading

Bloomberg
Charing Cross, United Kingdom
15 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Big Data
C++
Distributed Systems
Monitoring of Systems
Python
Reliability Engineering
Site Reliability Engineering Practices
Software Engineering
Diagnostic Tools
GitHub Copilot
Grafana
Spark
Reliability of Systems
Generative AI
Information Technology
Dynatrace

Job description

Senior Software Engineers - SRE in Electronic Trading (ET) ensure our global enterprise products spanning fixed income, equities, and derivatives are resilient and observable. This role focuses on building culture and platforms of observability and resilience to prevent market disruptions for global traders. We specialize in proactive anomaly detection, providing advanced performance insights and best practice guidance. Our team collaborates with application developers to define meaningful SLOs, implement chaos engineering, and build diagnostic tools that mitigate architectural risks as our platforms scale.

What's in it for you?

You will have the autonomy to drive reliability initiatives end-to-end, influencing the reliability strategy for critical global trading systems. By championing modern SRE practices and automation, you will fundamentally transform how we manage system stability. In your day-to-day, you'll develop frameworks for tracking reliability metrics, collaborate on system health reports, and build libraries that standardize alerting and incident response. You will also use failure injection and chaos testing to validate system performance under real-world stress. Our teams primarily build software using Python We'll trust you to:

  • Define and promote standards for observability, alerting, and incident response.
  • Develop self-maintaining tools using statistical analysis, health metrics, and distributed tracing.
  • Embed resiliency best practices into the full software development lifecycle.
  • Lead initiatives to mitigate risks related to performance, capacity, and scale.
  • Translate technical findings into actionable insights for engineers and stakeholders.
  • Automate operational tasks to enhance the safety and scalability of our infrastructure.

Requirements

  • Professional experience with Python or C++.
  • Strong collaboration and communication skills.
  • An understanding of distributed systems and system reliability.
  • Familiarity with SLOs, SLIs, and SLAs.
  • A degree in Computer Science, Engineering, or equivalent practical experience.

We'd love to see:

  • Experience in an SRE, Reliability or Production Engineering role.
  • Deep knowledge of system health assessment and building effective alerting.
  • Hands-on experience with monitoring tools (e.g., Grafana, Humio) and chaos engineering.
  • Familiarity with leveraging Generative AI (e.g., GitHub Copilot, Gemini) to accelerate development.
  • Experience with big data technologies like Apache Spark or Amazon S3.

Apply for this position