Site Reliability Engineer - NS London
Role details
Job location
Tech stack
Job description
Supporting and maintaining essential service that support core mission applications, proactively enhancing their availability, performance and stability. Being part of the 24/7 on call rota, supporting critical production systems out of business hours, for which additional on call allowances and overtime benefits will be paid. Finding innovative solutions to problems rather than undertaking repetitive work, automating everything you can. You will work alongside development teams, advising them of good practice in how to design and build systems, learning from what you know works well. You will design and deploy monitoring products, creating bespoke tools where required, to provide comprehensive and intelligent observations to meet the customer requirements and demonstrate the improvements the team are making on a daily basis. You will be well versed in the relationship between software and infrastructure, understanding the characteristics of systems that enable them to be scalable and resilient to failure, and how to get the best out of the infrastructure they are deployed to. Participating in the wider DevOps/SRE community within the organisation.
Requirements
It is desirable for you to have experience in the areas below. However more valued for this role is that you have excitement and enthusiasm to learn new technologies, and to deal with hard problems. Training, knowledge sharing and on the job development will enable you to plug any knowledge gaps.
o Software development in web technologies and object oriented programming o Database technologies such as Oracle SQL, Mongo, Postgres o Know your way around Linux and Windows command lines, e.g. Bash and PowerShell o Monitoring large systems using technologies such as Grafana, Prometheus, ELK, Splunk o Experience of working in Agile teams, and the tooling that supports it, e.g. Atlassian o Diagnosing and troubleshooting application issues resulting in service outages o Troubleshooting skills across different levels of the stack o Understanding of ITIL o Micro-services architectures, Docker and container platforms such as Openshift, Kubernetes
Awareness and insight into technology trends to adopt new cutting edge tools