Platform Engineer

Ua Consulting
Charing Cross, United Kingdom
13 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Charing Cross, United Kingdom

Tech stack

Kubernetes Security
Java
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Configuration Management
DevOps
Identity and Access Management
Reliability Engineering
Ansible
Prometheus
Software Deployment
Data Logging
Scripting (Bash/Python/Go/Ruby)
React
Grafana
Kubernetes
Terraform
Amazon Web Services (AWS)

Job description

We're looking for a Platform Engineer with strong site reliability principles to join our Platform team. You'll focus on maintaining and improving production reliability, automating operational tasks, and enhancing our observability stack. You'll work closely with SREs, support engineers, release managers, and incident managers to ensure our systems meet SLIs, SLOs, and SLA targets.

Key Responsibilities

Maintain and optimise production environments in AWS (EKS, EC2, RDS/Aurora, S3).

Develop and maintain Infrastructure as Code using Terraform and configuration management with Ansible.

Enhance monitoring, logging, and alerting using the Grafana stack (Prometheus, Loki, Tempo).

Participate in incident management, root cause analysis, and post-incident reviews.

Implement automation to reduce manual operational tasks and improve recovery time.

Contribute to the definition and tracking of SLIs, SLOs, and error budgets.

Collaborate with release and support teams to ensure smooth, reliable rollouts.

Maintain and improve documentation for operational runbooks and platform processes.

Requirements

Solid experience managing Kubernetes clusters (AWS EKS) in production.

Proficient with AWS services relevant to production workloads (EKS, EC2,

RDS/Aurora, S3, IAM).

Infrastructure as Code with Terraform and configuration management with Ansible.

Strong experience with observability tools (Grafana, Prometheus, Loki, Tempo).

Understanding of SRE concepts (SLIs, SLOs, error budgets, capacity planning).

Comfortable working in incident and problem management processes.

Strong GitOps mindset for managing platform and configuration changes.

Good communication and documentation skills.

Qualifications (Desirable)

Certified Kubernetes Administrator (CKA) and/or Certified Kubernetes Security Specialist (CKS).

AWS Certified Solutions Architect - Associate and/or AWS Certified

DevOps Engineer - Professional.

Nice-to-Have

Experience with Python scripting for automation and reliability tooling.

Knowledge of Java and/or React application deployments in production.

Prior experience in high-volume, high-availability environments.

Apply for this position