Software Engineer (Monitoring Platform)
Role details
Job location
Tech stack
Job description
As a Software Engineer (Monitoring Platform) here at SRT, you will be part of a small team responsible for designing, building, and maintaining our productised monitoring and observability platform. This platform is deployed across geographically distributed on-premises sites worldwide, serving clients with varying infrastructure and WAN capabilities.
Rather than simply using Prometheus and Grafana, you will be engineering the frameworks, tooling, and configuration pipelines that make our monitoring platform consistent, maintainable, and scalable across dozens of deployments. You as a Software Engineer (Monitoring Platform) will work closely with a lead observability engineer who oversees the platform's architecture, and you will have the authority to architect monitoring solutions and specify changes to be implemented by other development teams.
We are fortunate to have a team of highly experienced engineers, including UX designers, who can provide support and guidance as we extend the platform's capabilities to serve both internal engineers and external end-users., Build and maintain configuration generation frameworks using Ansible, Jinja2, and Jsonnet to ensure consistency across deployments Design and manage Docker Compose-based service orchestration for the monitoring stack Develop and maintain CI/CD pipelines (Jenkins) for building, testing, and packaging platform releases
Dashboards-as-Code & Visualisation
Develop Grafana dashboards programmatically using the Grafana Foundation SDK (Python) and JSON provisioning Design reusable, templated dashboard components that can be configured per-deployment Collaborate with engineering and product teams to create tailored visualisations for both engineers and end-users
Monitoring Architecture & Design
Design and configure Prometheus-based metric collection, including recording rules, alerting rules, and service discovery Develop and maintain metric exporters for application and system-level data Architect monitoring solutions and produce specifications for implementation by other development teams
Tooling & Automation
Build and maintain Python and Bash tooling for deployment, bundling, and platform operations Develop automation to support environment-specific configuration layering and threshold management Contribute to the platform's packaging and distribution pipeline
Requirements
Required Skills & Experience - Software Engineer (Monitoring Platform)
Strong software engineering fundamentals - you write clean, well-structured, maintainable code regardless of language. You understand principles like separation of concerns, composability, and DRY, and you apply them to everything from Python libraries to Bash scripts to YAML templates Proven experience with Prometheus (including PromQL) and Grafana in production environments Experience with configuration management and generation tools (Ansible, Jinja2, or similar) Proficiency in Python and Bash in a Linux environment Experience with Docker and container orchestration (Docker Compose) Strong knowledge of Linux-based systems Familiarity with CI/CD pipelines (Jenkins or similar) Ability to think architecturally - designing solutions that are consistent, scalable, and maintainable across multiple deployments Comfortable working autonomously in a small team with significant ownership over your work
Desirable Skills
Experience with Grafana-as-code approaches (Grafana Foundation SDK, Grafonnet, or JSON provisioning) Familiarity with Jsonnet for configuration generation Experience with Thanos or other long-term metric storage solutions Knowledge of SNMP-based monitoring
Within SRT the role title for this position will be System Monitoring & Observability Engineer