Platform - Senior Site Reliability Engineer (Resilience)

Elastic

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Systems Engineering

Software as a Service

Linux

Distributed Systems

Elasticsearch

Software Maintenance

Reliability Engineering

Software Engineering

Cloud Platform System

Kubernetes

Terraform

Serverless Computing

Docker

Programming Languages

Job description

Platform - Senior Site Reliability Engineer Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. What is The Role: As part of the Platform Engineering department, the SRE team is designing, building, scaling and maturing the multi-cloud platform for hosting internal and external services such as the Elastic Cloud Hosted and Serverless. What you will be doing: Taking an engineering approach in leading technical initiatives for automating system engineering efforts to guarantee the reliability of the global Elastic infrastructure. Growing our global Platform infrastructure to meet the increasing scaling demands by developing and maintaining software, tooling and

Requirements

automations. Using an inclusive approach at championing an environment focused on collaboration, operational excellence, and uplifting others. Responding to and preventing repeated customer impact in response to major incidents and prioritised problem management. What you bring: Success and lessons of experiences from striving for 'progress not perfection' in the name of Platform reliability. A background in software engineering to collaborate with engineers to expertly identify, implement and deliver solutions. Passion for developing solutions that involve inclusive communication methods to grow and strengthen partner and team relationships. Bonus Points: You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform You have built or operated a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it. You have written non-trivial programs in Golang or other programming languages. You have worked with containerized services (such as Docker.) You have proven experience in leading and improving alerting and major incident management standard processes metrics systems. You have experience in system administration with professional skills in Linux on distributed systems at scale. You have diagnosed or designed, implemented and created solutions with the Elastic Stack. You are experienced in thriving in a self-organizing and sharing in a globally distributed team environment. You strengthen team members in bringing out the best of each other by uplifting others with coaching and mentoring. Additional Information - We Take Care Of Our People Elastic is an equal opportunity employer and is committed to creating an inclusive culture that celebrates different perspectives, experiences, and backgrounds. We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals.