Site Reliability Engineer

Apetan Consulting
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Amazon Web Services (AWS)
Azure
Bash
Computer Security
Computer Programming
Computer Networks
DDoS Mitigation
DevOps
Distributed Systems
DNS
Monitoring of Systems
Hypertext Transfer Protocols (HTTP)
Internet Security
Python
Open Source Technology
Performance Tuning
Reliability Engineering
Prometheus
Zero Trust Network Access
TCP/IP
Software Vulnerability Management
Data Logging
Scripting (Bash/Python/Go/Ruby)
Google Cloud Platform
Istio
Grafana
Cloudformation
Containerization
Kubernetes
Infrastructure Automation Frameworks
Low Latency
Linkerd (Service Mesh)
Terraform
Ddos
Docker
ELK

Job description

We are seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, and security of our internet-facing security platform. You will work on high-availability systems that protect and process large-scale network traffic, driving automation, observability, and incident response excellence., * Design, build, and operate highly available, scalable, and secure infrastructure

  • Maintain uptime and performance of internet security platforms (e.g., WAF, DDoS protection, gateways)
  • Implement and improve observability (monitoring, logging, tracing, alerting)
  • Automate infrastructure provisioning and operational workflows
  • Lead incident response, root cause analysis, and postmortems
  • Collaborate with security, platform, and development teams to harden systems
  • Optimize system performance, latency, and cost efficiency
  • Define and enforce SLOs, SLIs, and error budgets

Requirements

  • Strong experience in Site Reliability Engineering, DevOps, or production engineering
  • Proficiency in Linux/Unix systems and networking fundamentals (TCP/IP, DNS, HTTP/S)
  • Experience with cloud platforms (AWS, Google Cloud Platform, or Azure)
  • Hands-on experience with containerization and orchestration (Docker, Kubernetes)
  • Strong scripting/programming skills (Python, Go, or Bash)
  • Experience with infrastructure as code (Terraform, CloudFormation)
  • Knowledge of monitoring tools (Prometheus, Grafana, ELK stack, etc.), * Understanding of internet security concepts (TLS, firewalls, WAF, Zero Trust)
  • Experience mitigating DDoS attacks and handling large-scale traffic patterns
  • Familiarity with CDN, edge networks, and secure proxy architectures
  • Knowledge of vulnerability management and system hardening, * Experience operating high-scale distributed systems
  • Familiarity with incident management tools and on-call practices
  • Exposure to compliance standards (SOC 2, ISO 27001, etc.)
  • Experience with service mesh (e.g., Istio, Linkerd), * Drive reliability best practices across teams
  • Mentor junior engineers and improve operational maturity
  • Lead critical incident handling and continuous improvement initiatives, * Strong problem-solving and analytical thinking
  • Clear communication during high-pressure incidents
  • Ownership mindset with a focus on reliability and security, * Experience working in cybersecurity or internet-scale platforms
  • Contributions to open-source SRE or security tooling

Apply for this position