Site Reliability Engineer

Womentech Network

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Tech stack

Amazon Web Services (AWS)

Azure

Software as a Service

Computer Security

Computer Programming

Python

Performance Tuning

Reliability Engineering

Akamai

Shell Script

Datadog

Data Logging

Cloud Platform System

Grafana

Containerization

Kubernetes

Terraform

Splunk

New Relic (SaaS)

Docker

Job description

In this role, you'll be a key contributor to our high-performance SaaS Cloud Platform

As a Site Reliability Engineer, you will design and automate scalable infrastructure with infrastructure-as-code, implement and tune monitoring and observability systems to meet defined SLIs and SLOs, lead on-call incident response and post-mortem reviews to continuously improve system resilience, collaborate with development teams on performance optimization and capacity planning, and drive the automation of routine operational tasks to minimize toil and maximize uptime.

developing, improving, and maintaining Guardicore's Cyber Security SaaS cloud platform.
leading problem-solving efforts for the entire technology stack in collaboration with other teams in the R&D
Establishing scalable, efficient, automated processes for large-scale data analyses.
Working closely with other R&D to develop a strategy for long-term data platform architecture
Practicing infrastructure as a code (IaC) and GitOps using technologies like Terraform and ArgoCD
Collaborating with Guardicore's development and research groups to constantly improve our platform and infrastructure
Participating in the on-call rotation supporting the applications and infrastructure
Developing and evolving our tooling, logging, monitoring, and alerting mechanisms to increase observability and transparency Do what you love To be successful in this role you will:
Have experience with containerization and orchestration technologies (, Docker, Kubernetes)

Requirements

Have excellent problem-solving skills and ability to think critically about complex technical challenges and optimizing production systems
Have experience with observability systems such as Datadog/Splunk/New Relic/Grafana, or similar
Have experience in Shell scripting and/or high-level Programming like Python and Go
Have experience working with cloud environments like GCP, Linode, AWS, and Azure
Have excellent verbal and written English communication and presentation skills, Our ability to shape digital life today relies on developing exceptional people like you. The kind that can turn impossible into possible. We're doing everything we can to make Akamai a great place to work. A place where you can learn, grow and have a meaningful impact.

About the company

With our company moving so fast, it's important that you're able to build new skills, explore new roles, and try out different opportunities. There are so many different ways to build your career at Akamai, and we want to support you as much as possible. We have all kinds of development opportunities available, from programs such as GROW and Mentoring, to internal events like the APEX Expo and tools such as Linkedin Learning, all to help you expand your knowledge and experience here.