Splunk & OpenShift Observability Engineer
CBS Butler Limited
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Compensation
£ 59KJob location
Tech stack
Microsoft Windows
API
Cloud Computing
Configuration Management Databases
DevOps
Machine Learning
Node.js
Online Transaction Processing
Openshift
Performance Tuning
Role-Based Access Control
Reliability Engineering
Prometheus
Macros
Data Ingestion
Fluentd
System Availability
Indexer
Kubernetes
Kafka
Splunk
Webhooks
Job description
- Design, deploy, and operate Splunk Enterprise and ITSI across hybrid Kubernetes/OpenShift platforms
- Onboard and normalise data at scale (HEC, Universal Forwarder, Deployment Server), aligning to CIM standards
- Build and optimise ITSI service models: service trees, KPIs, adaptive thresholds, NEAP policies, glass tables, deep dives, and health scoring
- Deliver OpenShift-focused executive and operational dashboards, including cluster/API/etcd health, node readiness and resource pressure, pod restart trends and noisy-neighbour detection, network and storage error visibility, and capacity, quota, and burst analysis
- Optimise search and platform performance (workload rules, DMA, summary indexing, scheduling hygiene, concurrency tuning)
- Implement intelligent alerting and automated routing into ITSM and ChatOps platforms, including enrichment, suppression windows, and maintenance scheduling
- Govern data ingestion and security controls (RBAC, retention, PII handling, TLS, token governance, index and role mapping)
- Integrate telemetry pipelines including OpenTelemetry, Prometheus, Fluentd/Fluent Bit/Vector, Kafka, CMDB and AIOps/ML solutions
- Drive SLO/KPI alignment, golden signal monitoring, rollout/rollback health validation, and executive reporting
Technologies:
- API
- Cloud
- CMDB
- ITSM
- Kafka
- Kubernetes
- Network
- OpenTelemetry
- OpenShift
- Prometheus
- RBAC
- Security
- Splunk
- Windows
- NodeJS
- DevOps
More:
We are looking for a Splunk & OpenShift Observability Engineer to design, deploy, and optimise enterprise-grade monitoring across hybrid Kubernetes and OpenShift environments. This high-impact role will allow you to shape observability strategy, enhance service intelligence, and ensure platform reliability at scale while balancing performance, cost efficiency, and security governance. You will work at the intersection of platform engineering, observability, and service intelligence, helping to transform raw telemetry into actionable insight and improve operational maturity across a modern cloud-native estate.
Requirements
- Deep expertise in Splunk Enterprise (SPL mastery, CIM alignment, saved searches, macros, KV stores, index/retention/RBAC design, performance tuning)
- Strong experience with Splunk ITSI (service trees, KPIs, adaptive/time-based thresholds, NEAP tuning, Service Analyzer configuration)
- Proven OpenShift/Kubernetes observability experience across control-plane metrics, events, logs, workload correlation, and capacity management
- Hands-on experience with telemetry pipelines (OpenTelemetry/OTLP, Prometheus exporters, Fluentd/Fluent Bit/Vector, Kafka with TLS, HEC/UF/DS onboarding)
- Strong understanding of reliability engineering principles (golden signals, SLO design, namespace/application KPI mapping)
- Experience optimising performance and licensing costs using workload rules, DMA, and summary indexing
- Solid security and compliance knowledge (TLS/mTLS, certificate/token hygiene, PII controls, auditability, role/index mapping)
- Automation and integration expertise across ITSM, ChatOps, webhooks, CMDB enrichment, and AIOps tooling