Site Reliability Engineer
Role details
Job location
Tech stack
Job description
At NinjaOne we are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Site Reliability Engineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services.
Location - We are flexible on remote working from home, if you are based in the UK or Germany. This is a fully remote position with the option to be hybrid if you prefer.
On Call Requirements - Participate in our 24×7 on-call rotation, SCRUM, and deployment planning.
We hire the best software engineers, but experience in our stack can't hurt: NinjaOne is built on Java, Kotlin, C++, and Postgres, supporting millions of user endpoints and running as a scalable cloud service in AWS. Knowing large-scale datastore bottlenecks, asynchronous application design and client-server architecture will help you.
What you'll be doing
- Diagnose and resolve complex application and infrastructure issues
- Participate in our 24×7 on-call rotation, SCRUM, and deployment planning
- Perform Root Cause Analysis (RCA) and provide recommendations for application teams
- Improve availability and reduce customer impact using Industry best observability tools
- Ensure best-practice and security-minded architecture by influencing design decisions
- Create and maintain technical documentation and SOP's
- Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure
- Other duties as needed
Requirements
- 5+ years' experience in Site Reliability Engineer roles
- Expert+ level Linux administration, scripting, and troubleshooting
- Demonstrable knowledge of Observability tools (Prometheus/Grafana, New Relic, Splunk, DataDog)
- Comprehensive experience with AWS (Amazon Web Services) and its core capabilities (VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc)
- Extensive experience with cloud automation and infrastructure-as-code (IaC) toolsets, primarily CloudFormation but also including Terraform, Helm and Ansible. CDK a plus.
- Good understanding of containers, Fargate, Kubernetes, and overall distributed microservice architectures
- Passionate about automation, security, and self-service environments/portals
- Hands-on experience with CI/CD and SDLC (Software Development Life Cycle) processes
- Effective communication skills, both verbal and written.
Benefits & conditions
Access to our Corporate Benefits Platform (with discounts for brands such as Expedia, FitX, Zalando and many more).
Develop your skills through our renowned training platform.
Receive competitive compensation.
Collaborate with a curious, kind, international and intercultural workforce.
This position is NOT eligible for Visa sponsorship.