Site Reliability Engineer

Visa
Cambridge, United Kingdom
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Cambridge, United Kingdom

Tech stack

Component-Based Software Engineering
User Authentication
Command-Line Interface
Software as a Service
Cloud Computing
Cloud Engineering
Configuration Management
Continuous Integration
Data Security
Linux
DNS
Elasticsearch
Revision Control Systems
Python
MongoDB
Networking Basics
Routing
PCI Data Security Standards
Reliability Engineering
Ansible
Prometheus
Shell Script
Software Deployment
Software Engineering
Istio
Saltstack
Grafana
Firewalls (Computer Science)
GIT
Containerization
Kubernetes
Hashicorp
Kafka
Data Management
Terraform
Serverless Computing
Docker
Programming Languages

Job description

As a Site Reliability Engineer (Cloud Ops), you will help operate and continuously improve Featurespace's world-leading product, ARIC Risk Hub, delivered as a robust cloud-based SaaS solution. You will work as part of the Cloud Operations / SRE team to ensure our platform is reliable, scalable, measurable, repeatable, secure, and cost-effective.

You will participate in designing, developing, deploying, monitoring, supporting, documenting, and troubleshooting our SaaS platform, collaborating closely with engineering, data science, internal stakeholders, external vendors, and customers to deliver excellent service outcomes.

Responsibilities

We hire people with a willingness to adapt to a variable role. Along with the responsibilities below, we ask for ownership of any other duties as required.

  • Operate and support production deployments of ARIC Risk Hub SaaS, including deploying, maintaining, monitoring, upgrading, and troubleshooting platform and application components.
  • Build software and systems to manage platform infrastructure and applications.
  • Continuously evaluate and improve technology and operational processes to increase quality, reduce costs, and improve time-to-market.
  • Participate in service resilience and failure testing, including predictable and unpredictable failure scenarios.
  • Provide second-line operational support for SaaS customers, ensuring timely and high-quality issue resolution.
  • Gather service performance data and generate reports and insights to guide reliability and scalability improvements.
  • Develop, maintain, and document internal processes and operational runbooks.
  • Collaborate with engineering and data science teams to drive new and improved ARIC Risk Hub capabilities.
  • Participate in an on-call roster, including out-of-hours support as required.

This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.

Requirements

  • Experience administering cloud infrastructure or supporting cloud applications (preferably AWS).
  • Working knowledge of Linux, shell scripting, and command-line tools.
  • Ability to write or maintain code in at least one high-level programming language (e.g., Python).
  • Understanding of networking fundamentals (e.g., DNS, routing, firewalls).
  • Familiarity with source control systems (e.g., Git).
  • Exposure to CI/CD concepts and pipelines.
  • Familiarity with monitoring, metrics, and alerting systems.
  • Experience operating and supporting production-grade services.
  • Ability to write clear technical documentation and follow defined operational processes.

Preferred

  • Infrastructure as Code and configuration management experience (e.g., Terraform, SaltStack, Ansible).
  • Experience with containerization (Docker) and Kubernetes (deploying or operating services).
  • Exposure to service mesh technologies (e.g., Istio).
  • Experience building or operating cloud-native or serverless applications.
  • Familiarity with observability and data platforms such as Prometheus, Grafana, MongoDB, Elasticsearch, Kafka, and HashiCorp Vault.
  • Understanding of application and data security fundamentals (authentication, authorization, encryption, TLS).
  • Awareness of regulated standards (e.g., PCI-DSS, SOC2, ISO27001)., * Relevant industry experience supporting cloud-based SaaS platforms in production environments.
  • Excellent interpersonal and communication skills, with the ability to collaborate across teams and organizations.
  • Strong attention to detail and a proactive, best-practice-driven approach to work.
  • Passion for learning new skills and technologies and staying current with industry developments.
  • Curiosity, innovation, and enthusiasm for solving complex problems.
  • Strong time-management skills and the ability to prioritize effectively.

About the company

Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale - tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters - to you, to your community, and to the world. Progress starts with you., At Featurespace (a Visa company), we strive to be the world's best software company at protecting our clients and their customers from fraud attacks and fighting financial crime. We do that with personality, heart and professionalism-cultivating an innovative, fun and positive team atmosphere where everybody can contribute to solving our clients' problems in new, innovative ways.

Apply for this position