Senior DevOps Engineer
Role details
Job location
Tech stack
Job description
The Senior DevOps Engineer owns everything related to server infrastructure, virtualisation, containerisation, storage architecture, and the DevOps toolchain. This is a deeply technical, hands-on role with broad autonomy to define how our systems are architected, built, and operated - with clear accountability to deliver results.
You will work alongside the Infrastructure Manager to align your work with operational needs and business priorities. You are expected to produce structured proposals and business cases for significant changes, present them to the Infrastructure Manager for approval, and lead the end-to-end implementation.
About the Environment
- On-prem, bare-metal compute infrastructure across four sites
- Proxmox VE for virtualisation with Ceph and NFS-based storage
- Docker Swarm for container orchestration - Kubernetes migration under evaluation
- Production database platforms: MongoDB, CouchDB, PostgreSQL, MySQL, ClickHouse
- Business-critical telecoms services with stringent uptime requirements
- Minimal public cloud usage - this is a bare-metal, on-prem environment, Virtualisation & Server Infrastructure
- Own and operate the Proxmox VE cluster estate across all data centre sites
- Manage and maintain bare-metal server lifecycle: provisioning, patching, hardware fault management, and decommissioning
- Maintain performance, resilience, and capacity across physical and virtual server infrastructure
- Manage VM templates, snapshots, resource allocation, and cluster health monitoring
Storage Architecture & Management
We have recently migrated from Ceph to NFS. The Senior DevOps Engineer will be responsible for owning the ongoing evaluation of our storage strategy, including:
- Assessing whether the current NFS-based approach is fit for purpose at scale
- Evaluating Ceph as a potential return candidate - producing a formal architecture proposal including design, resource requirements, risk analysis, and implementation plan
- Presenting the proposal to the Infrastructure Manager with a clear recommendation
- Leading the full implementation of whichever approach is agreed, including installation, configuration, and ongoing management
- Maintaining backup integrity, replication, and recovery procedures for all storage systems
Container Orchestration & DevOps
- Own and manage the Docker Swarm estate, including all running services and deployment workflows
- Lead the evaluation of a potential migration from Docker Swarm to Kubernetes
- Produce a detailed business case for the migration, covering architecture design, resource implications, migration strategy, risk, and phased rollout plan
- Present the business case to the Infrastructure Manager, who will escalate to the board for sign-off
- Lead the full implementation of the approved migration, including tooling setup, service migration, and handover documentation
- Own all Docker instances across the estate, including configuration, monitoring, and lifecycle management
Database Operations
- Manage production database platforms: MongoDB, CouchDB, PostgreSQL, MySQL, and ClickHouse
- Ensure replication, resilience, backup integrity, and tested recovery procedures are in place for all database systems
- Advise on database architecture and contribute to capacity planning
SIP Infrastructure & VoIP Platforms
We operate a multi-platform SIP estate spanning class 4 and class 5 switching, session border control, and hosted PBX infrastructure. The Senior DevOps Engineer is responsible for the operational maintenance and configuration of these platforms, working closely with the UC Engineering team on debugging, tracing, and stability.
- Operate, maintain, and configure the Kamailio session border controller (SBC) estate
- Administer and maintain FreeSWITCH-based infrastructure
- Support and maintain Asterisk-based platforms, including PBXware
- Operate and maintain the SIPwise C5 class 5 switch and Yeti class 5 switch
- Perform SIP debugging and tracing to diagnose and resolve call flow, signalling, and media issues
- Work collaboratively with the UC Engineering team to ensure stability, performance, and continuity of VoIP platforms
- Support capacity planning, upgrades, and configuration changes across the SIP estate
Monitoring & Observability
- Own the monitoring platform stack across infrastructure and services
- Ensure alerting is effective, actionable, and covers all critical systems
- Maintain and improve observability tooling, dashboards, and incident detection capability
Architecture & Proposals
- Act as the technical authority on DevOps and infrastructure architecture decisions
- Produce structured proposals and business cases for significant changes, including rationale, design, risk, and implementation plan
- Collaborate with the Infrastructure Manager and other teams to align infrastructure with business priorities
- Advise on how systems should be architected and proactively identify areas for improvement
Cross-Functional Collaboration
- Work closely with the Infrastructure Manager to ensure DevOps and network operations are aligned
- Collaborate with development and service teams to support deployments and service operations
- Support the wider infrastructure team on DevOps-adjacent tasks and knowledge sharing
Performance KPIs
- Infrastructure uptime and availability for virtualisation, storage, and compute platforms
- Quality and timeliness of architecture proposals and business cases
- Change success rate: reduction in failed or rolled-back changes in the DevOps estate
- Backup integrity: all systems covered, tested, and recovery procedures validated
- Monitoring coverage: all critical systems instrumented, with effective alerting in place
- Documentation maturity: runbooks, diagrams, and SOPs maintained and current
Requirements
- Hands-on Proxmox VE cluster operations in production environments
- NFS storage management and administration
- Ceph storage - architecture, deployment, and operations
- Docker and Docker Swarm in production environments
- Database operations across MongoDB, CouchDB, PostgreSQL, MySQL, and ClickHouse - including replication, backup, and recovery
- Linux server administration at scale (bare metal and virtual)
- Monitoring and observability tooling
- Strong documentation discipline and ability to produce clear technical proposals
- Experience working in a structured, production-critical environment
Desirable Experience
- Kubernetes - design, implementation, and production operations
- Experience migrating workloads from Docker Swarm or similar to Kubernetes
- Familiarity with telecoms or VoIP infrastructure environments
- CI/CD pipeline design and management
- Infrastructure-as-code tooling (Terraform, Ansible, or similar) -> Terraform not used but Ansible yes, starting to use Terraform now
Working Style
- Technically deep, detail-driven, and reliable in live production environments
- Proactive: you identify problems before they become incidents and propose solutions
- Structured thinker who can translate technical complexity into clear recommendations
- Collaborative - you work effectively alongside the Infrastructure Manager and wider team
- Ownership mindset: you take full responsibility for your domain and follow through
- Growth-oriented: you actively develop your skills and stay current with the technology landscape