Lead Site Reliability Engineer

Global Ltd
Charing Cross, United Kingdom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charing Cross, United Kingdom

Tech stack

JavaScript
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Configuration Management
Code Review
Computer Programming
Continuous Integration
Software Debugging
Linux
DevOps
Disaster Recovery
DNS
Github
Redis
Reliability Engineering
Ruby
Datadog
Scripting (Bash/Python/Go/Ruby)
GIT
Cloudflare
Functional Programming
Cloudwatch
Terraform
Software Version Control
Docker

Job description

You will play a key role in maintaining and evolving FutureLearn's platform to ensure it is highly available, reliable, secure, and scalable as the business grows. Working closely with the Lead Technical Architect, SREs, and software engineers, you'll help shape the technical direction of our infrastructure while fostering a strong DevOps culture that enables teams to deliver high-quality services safely and efficiently.

We're looking for people who are curious, thoughtful, and eager to learn, with a genuine desire to use their experience to support and enable others. You'll need to communicate clearly, work effectively in a collaborative environment, and be comfortable operating autonomously when needed.

What does success look like:

Maintaining platform availability and reliability

  • Partner with the Lead Technical Architect to set and evolve the technical direction of our infrastructure, ensuring it scales to support business growth in a cost-effective manner.
  • Take responsibility for a platform that is secure, resilient, scalable, and cost-efficient.
  • Develop deep expertise in FutureLearn's technology stack and its practical application, including AWS (RDS, ECS, EC2, S3, Lambda), Cloudflare, Redis, DNS, Docker, and the wider infrastructure platform.
  • Use, maintain, and continuously improve observability tooling such as Datadog and AWS CloudWatch to monitor platform health, troubleshoot performance issues, and identify root causes.
  • Respond to incidents affecting the platform, including participation in the on-call rota.
  • Ensure disaster recovery and incident response processes are regularly tested and improved, designing exercises informed by industry best practices such as gamedays and chaos engineering.
  • Act as an expert in the tools used to manage infrastructure and CI/CD systems, including Terraform, GitHub Actions, and scripting languages.

Building a DevOps culture at FutureLearn

  • Own and continuously improve the developer experience, supporting SREs in refining how the FutureLearn application is developed, tested, and deployed so it is safer, faster, and easier to work on.
  • Champion CI/CD best practices, enabling engineers to reliably deliver high-quality services to production.
  • Empower software engineers to understand how to get their code into production and how to identify and debug performance issues.
  • Support engineers through pairing, teaching, mentoring, coaching, and code reviews, demonstrating the practices of an effective engineer.
  • Act as a subject matter expert for infrastructure and operational concerns across FutureLearn.

Requirements

Do you have experience in Terraform?, * Hands-on experience with containers and schedulers (Amazon ECS).

  • Experience using automated configuration management and infrastructure-as-code tools (Terraform).
  • A deep understanding of Linux, networking, and security.
  • Experience supporting database administration and performance, with a focus on scalability and maintainability.
  • A strong interest in automation and improving the developer experience.
  • Experience working closely with software engineers in an agile environment.
  • A solid understanding of Git and version control best practices to structure and communicate work effectively.

Preferred (not essential)

  • Programming experience in Ruby, JavaScript, or Go.
  • Experience managing relationships with external suppliers such as AWS or Cloudflare.

About the company

At FutureLearn, we're passionate about the power of lifelong learning. We help learners from all over the world progress in their careers - and invest in their futures. We truly believe that up-skilling is a worthy investment, and we hope to empower our learners to take control of their careers through personalised learning pathways - giving them progress at their fingertips. Partnering with 260+ world-class educational partners, including prestigious universities, global brands and industry partners, we offer our 20 million-strong learner community the opportunity to discover and access flexible, high-quality online courses and degrees. We're not here just to teach new skills (although we do that well), we want to help transform lives. FutureLearn is looking to build our teams with people who share our passion for lifelong learning, career empowerment and education for all. If that sounds like you, get in touch. You could help us achieve our biggest goal yet - becoming the world's best AI-powered, career-based learning platform and OPM.

Apply for this position