Site Reliability Engineer (SRE)

Valstro
Charing Cross, United Kingdom
18 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote
Charing Cross, United Kingdom

Tech stack

Amazon Web Services (AWS)
Systems Engineering
Azure
Bash
Python
Reliability Engineering
Prometheus
Software Engineering
Datadog
Data Logging
Google Cloud Platform
Cloud Platform System
Grafana
Reliability of Systems
Containerization
Kubernetes
Information Technology
Terraform
Docker
Go

Job description

Valstro is looking for a Site Reliability Engineer (SRE), to join our team! This person will help ensure the reliability, availability, and performance of our cloud native trading platform. The role entails building and maintaining infrastructure, automating process and working closely with the Development and Platform teams to ensure seamless integration and deployment of the service.

The successful candidate will serve as an essential link between the wider organization, executive leadership, and external vendors. Their responsibilities will include ensuring system reliability, building and maintaining monitoring solutions for both production and UAT systems, automating operational tasks, responding to incidents, and continuously improving systems and processes.

This is a remote position that will report to the Site Reliability Lead. What will you be doing?

  • Act as a key intermediary between engineering, executive leadership, and external vendors.
  • Ensure the reliability, availability, and performance of our cloud-based trading solutions.
  • Develop and maintain monitoring solutions to track system performance and reliability.
  • Automate operational tasks to improve efficiency and reduce manual intervention.
  • Collaborate with development teams to ensure seamless integration and deployment.
  • Respond to incidents and troubleshoot issues to minimize downtime.
  • Continuously improve systems and processes to enhance reliability and performance.
  • Participate in on-call rotations to provide 24/7 support for critical systems.

Requirements

Do you have experience in Terraform?, Do you have a Bachelor's degree?, 3+ years experience supporting Production level systems

  • Strong experience in site reliability engineering, systems engineering, or a related field.
  • Proficiency in cloud-based infrastructure (e.g. AWS, Azure, or Google Cloud.)
  • Experience with monitoring and logging tools (e.g., ELK, LGTM, Prometheus, Datadog).
  • Expertise in automation and scripting (e.g., Golang, Python, Bash, Terraform).
  • Knowledge of containerization and orchestration (e.g., Docker, Kubernetes).
  • Ability to effectively communicate and liaise between stakeholders, including internal teams, executive management and external vendors.
  • Strong troubleshooting and problem-solving skills.
  • Experience in establishing and enhancing reliability engineering practices and processes.
  • Capable of operating effectively in a dynamic organizational environment with high delivery and quality expectations.

Fintech = bonus

Technical

  • A recent bachelor's degree in Computer Science, Software Engineering or related field
  • Knowledge of SREing
  • Knowledge of observability and tooling particularly the Grafana stack

Benefits & conditions

Valstro offers an excellent benefits package, including pension or 401 (k) plans, unlimited PTO and highly competitive compensation. Our leadership team brings a wealth of experience and deep industry knowledge, and despite being a young company, we believe we have carefully dialed in our product-market fit.

Apply for this position