Senior Site Reliability Engineer - Production Engineering (Remote - United Kingdom)

Yelp
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Java
Amazon Web Services (AWS)
C++
Ubuntu (Operating System)
Software Debugging
Linux
DNS
Hypertext Transfer Protocols (HTTP)
Python
MySQL
Open Source Technology
Reliability Engineering
Ansible
Prometheus
Ruby
Systems Integration
TCP/IP
TypeScript
Rust
Grafana
GIT
Kubernetes
Cassandra
Puppet
Terraform
Splunk
Docker
Jenkins
Go
Programming Languages

Job description

  • Working with engineers across Yelp in supporting new features and services.
  • Integrating tools to monitor platform stability and performance.
  • Help scale our Kubernetes clusters and AWS-based infrastructure while maintaining our platform's SLOs.
  • Ensure the reliability of Yelp's primary datastores (MySQL and Cassandra).
  • Troubleshoot site issues using industry-leading tools like Splunk, Grafana, and Prometheus.
  • Automate everything with Python, Puppet, Git, Jenkins, Terraform and more!
  • Develop custom tools, when off-the-shelf solutions don't work at our scale and contribute upstream to open source projects.
  • Design and implement new systems, tests, and procedures
  • Foster and build a fun, diverse, and inclusive culture that reflects Yelp's values.
  • Bring your curiosity, tenacity and experience.
  • Participate in light on-call rotations - we have geographically distributed SRE teams for follow-the-sun support, which reduces the need to be on-call 24h a day!, * Mastery of Linux (we use Ubuntu but any distro is fine), with the view of debugging ambiguous OS behaviours!!
  • Command of your favorite modern programming language to appreciate delivering safe and secure services: Python, Typescript, Ruby, Go, Rust, Java, C++, etc.
  • A solid understanding of Internet fundamental technologies in delivering services on the Internet (TCP/IP, HTTP, DNS, etc).
  • Experience with public cloud platforms (we use AWS and GCP, but others are also fine) and related tooling (Terraform, Puppet, Chef, Ansible etc.).
  • Experience with Linux containerisation and orchestration (e.g., Docker, Podman and Kubernetes).
  • Self-motivated to investigate, fix and improve Yelp in an ever changing environment.
  • Leading, Collaborating and Sharing technical activities with teams.
  • Own the total lifecycle of a system., A Basic criminal background check via AccessNI is required for employment. Yelp complies with the AccessNI Code of Practice. Having a criminal record will not necessarily prevent a candidate from working with Yelp. Yelp will consider the nature of the position together with the circumstances and background of the candidate's offences or other information contained on a disclosure certificate. AccessNI's Privacy Policy is available here. Yelp's Criminal Background Check Policy is available upon request. Note: Yelp does not accept agency resumes. Please do not forward resumes to any recruiting alias or employee. Yelp is not responsible for any fees related to unsolicited resumes. #LI-Remote Recruiting and Applicant Privacy Notice

Requirements

Do you have experience in Ubuntu?

Benefits & conditions

  • Competitive salary, a pension scheme, and an optional employee stock purchase plan.
  • 25 days paid holiday (rising to 29 with service), plus one floating holiday.
  • £150 monthly reimbursement to help cover remote working expenses.
  • £81 caregiver reimbursement to support dependent care for families.
  • Private health insurance, including dental and vision.
  • Flexible working hours and meeting-free Wednesdays.
  • Regular 3-day Hackathons, bi-weekly learning groups, and productivity spending to support and encourage
  • your career growth.
  • Opportunities to participate in digital events and conferences.
  • £81 per month to use toward qualifying wellness expenses.
  • Quarterly team offsites.

About the company

Yelp engineering culture is driven by our values: we're a cooperative team that values individual authenticity and encourages creative solutions to problems. All new engineers deploy working code their first week, and we strive to broaden individual impact with support from managers, mentors, and teams. At the end of the day, we're all about helping our users, growing as engineers, and having fun in a collaborative environment. Do you want to build and manage scaleable, self-healing, globally-distributed systems? Our Site Reliability engineers keep Yelp fast, available, and growing, connecting users to great local businesses. No matter how many times we get searched, scraped, scanned, spammed, pinged, paged, or queried, we gotta keep our cool - and keep the site running smoothly. We work in both the development and systems worlds, implementing key parts of the core architecture and supporting developers as they try to do the same. We get to tackle interesting challenges that you can only find at the kind of scale that serves over 100 million users per month. You'll work to empower Yelp: spinning up infrastructure should always be a git commit and a code review away, with automation and self-service being at the core of what we do. This opportunity requires you to be located in the United Kingdom. We'd love to have you apply, even if you don't feel you meet every single requirement in this posting. At Yelp, we're looking for great people, not just those who simply check off all the boxes.

Apply for this position