Site Reliability Engineer

Visa

Cambridge, United Kingdom

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Cambridge, United Kingdom

Tech stack

Component-Based Software Engineering

User Authentication

Command-Line Interface

Software as a Service

Cloud Computing

Cloud Engineering

Configuration Management

Continuous Integration

Data Security

Linux

DNS

Elasticsearch

Revision Control Systems

Python

MongoDB

Networking Basics

Routing

PCI Data Security Standards

Reliability Engineering

Ansible

Prometheus

Shell Script

Software Deployment

Software Engineering

Istio

Saltstack

Grafana

Firewalls (Computer Science)

GIT

Containerization

Kubernetes

Hashicorp

Kafka

Data Management

Terraform

Serverless Computing

Docker

Programming Languages

Job description

As a Site Reliability Engineer (Cloud Ops), you will help operate and continuously improve Featurespace's world-leading product, ARIC Risk Hub, delivered as a robust cloud-based SaaS solution. You will work as part of the Cloud Operations / SRE team to ensure our platform is reliable, scalable, measurable, repeatable, secure, and cost-effective.

You will participate in designing, developing, deploying, monitoring, supporting, documenting, and troubleshooting our SaaS platform, collaborating closely with engineering, data science, internal stakeholders, external vendors, and customers to deliver excellent service outcomes.

Responsibilities

We hire people with a willingness to adapt to a variable role. Along with the responsibilities below, we ask for ownership of any other duties as required.

Operate and support production deployments of ARIC Risk Hub SaaS, including deploying, maintaining, monitoring, upgrading, and troubleshooting platform and application components.
Build software and systems to manage platform infrastructure and applications.
Continuously evaluate and improve technology and operational processes to increase quality, reduce costs, and improve time-to-market.
Participate in service resilience and failure testing, including predictable and unpredictable failure scenarios.
Provide second-line operational support for SaaS customers, ensuring timely and high-quality issue resolution.
Gather service performance data and generate reports and insights to guide reliability and scalability improvements.
Develop, maintain, and document internal processes and operational runbooks.
Collaborate with engineering and data science teams to drive new and improved ARIC Risk Hub capabilities.
Participate in an on-call roster, including out-of-hours support as required.

This is a hybrid position. Expectation of days in office will be confirmed by your Hiring Manager.

Requirements

Experience administering cloud infrastructure or supporting cloud applications (preferably AWS).
Working knowledge of Linux, shell scripting, and command-line tools.
Ability to write or maintain code in at least one high-level programming language (e.g., Python).
Understanding of networking fundamentals (e.g., DNS, routing, firewalls).
Familiarity with source control systems (e.g., Git).
Exposure to CI/CD concepts and pipelines.
Familiarity with monitoring, metrics, and alerting systems.
Experience operating and supporting production-grade services.
Ability to write clear technical documentation and follow defined operational processes.

Preferred

Infrastructure as Code and configuration management experience (e.g., Terraform, SaltStack, Ansible).
Experience with containerization (Docker) and Kubernetes (deploying or operating services).
Exposure to service mesh technologies (e.g., Istio).
Experience building or operating cloud-native or serverless applications.
Familiarity with observability and data platforms such as Prometheus, Grafana, MongoDB, Elasticsearch, Kafka, and HashiCorp Vault.
Understanding of application and data security fundamentals (authentication, authorization, encryption, TLS).
Awareness of regulated standards (e.g., PCI-DSS, SOC2, ISO27001)., * Relevant industry experience supporting cloud-based SaaS platforms in production environments.
Excellent interpersonal and communication skills, with the ability to collaborate across teams and organizations.
Strong attention to detail and a proactive, best-practice-driven approach to work.
Passion for learning new skills and technologies and staying current with industry developments.
Curiosity, innovation, and enthusiasm for solving complex problems.
Strong time-management skills and the ability to prioritize effectively.

About the company

Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid. At Visa, you'll have the opportunity to create impact at scale - tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters - to you, to your community, and to the world. Progress starts with you., At Featurespace (a Visa company), we strive to be the world's best software company at protecting our clients and their customers from fraud attacks and fighting financial crime. We do that with personality, heart and professionalism-cultivating an innovative, fun and positive team atmosphere where everybody can contribute to solving our clients' problems in new, innovative ways.