Site Reliability Engineer

Womentech Network
1 month ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Tech stack

Amazon Web Services (AWS)
Azure
Software as a Service
Computer Security
Computer Programming
Python
Performance Tuning
Reliability Engineering
Akamai
Shell Script
Datadog
Data Logging
Cloud Platform System
Grafana
Containerization
Kubernetes
Terraform
Splunk
New Relic (SaaS)
Docker

Job description

In this role, you'll be a key contributor to our high-performance SaaS Cloud Platform

As a Site Reliability Engineer, you will design and automate scalable infrastructure with infrastructure-as-code, implement and tune monitoring and observability systems to meet defined SLIs and SLOs, lead on-call incident response and post-mortem reviews to continuously improve system resilience, collaborate with development teams on performance optimization and capacity planning, and drive the automation of routine operational tasks to minimize toil and maximize uptime.

  • developing, improving, and maintaining Guardicore's Cyber Security SaaS cloud platform.

  • leading problem-solving efforts for the entire technology stack in collaboration with other teams in the R&D

  • Establishing scalable, efficient, automated processes for large-scale data analyses.

  • Working closely with other R&D to develop a strategy for long-term data platform architecture

  • Practicing infrastructure as a code (IaC) and GitOps using technologies like Terraform and ArgoCD

  • Collaborating with Guardicore's development and research groups to constantly improve our platform and infrastructure

  • Participating in the on-call rotation supporting the applications and infrastructure

  • Developing and evolving our tooling, logging, monitoring, and alerting mechanisms to increase observability and transparency Do what you love To be successful in this role you will:

  • Have experience with containerization and orchestration technologies (, Docker, Kubernetes)

Requirements

  • Have excellent problem-solving skills and ability to think critically about complex technical challenges and optimizing production systems
  • Have experience with observability systems such as Datadog/Splunk/New Relic/Grafana, or similar
  • Have experience in Shell scripting and/or high-level Programming like Python and Go
  • Have experience working with cloud environments like GCP, Linode, AWS, and Azure
  • Have excellent verbal and written English communication and presentation skills, Our ability to shape digital life today relies on developing exceptional people like you. The kind that can turn impossible into possible. We're doing everything we can to make Akamai a great place to work. A place where you can learn, grow and have a meaningful impact.

About the company

With our company moving so fast, it's important that you're able to build new skills, explore new roles, and try out different opportunities. There are so many different ways to build your career at Akamai, and we want to support you as much as possible. We have all kinds of development opportunities available, from programs such as GROW and Mentoring, to internal events like the APEX Expo and tools such as Linkedin Learning, all to help you expand your knowledge and experience here.

Apply for this position