Senior Site Reliability Engineer
ICEO - Venture Builder
2 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
€ 90KJob location
Remote
Tech stack
Secure Shell (SSH)
Java
Amazon Web Services (AWS)
Apache HTTP Server
Confluence
JIRA
Azure
Bash
C++
Ubuntu (Operating System)
Continuous Integration
Debian Linux
Linux
DevOps
Document Management Systems
DNS
Elasticsearch
Fault Tolerance
HTTP Secure
Java Virtual Machine (JVM)
Python
PostgreSQL
Linux System Administration
Nginx
Node.js
OpenVPN
Redis
Reliability Engineering
Prometheus
Systems Architecture
TCP/IP
Tripwire
Wide Area Networks
Data Logging
Load Balancing
Okta
Fluentd
Istio
Grafana
Reliability of Systems
Firewalls (Computer Science)
Containerization
Kubernetes
Kafka
Bitbucket
Kibana
REST
Terraform
Software Version Control
Google Meet
Docker
Go
Job description
- Own and lead the definition and execution of the SRE vision and strategy, ensuring alignment with business objectives and engineering priorities.
- Architect, maintain and develop infrastructure within GCP and GKE, focusing on performance, security, availability and reliability.
- Develop automated solutions for system reliability, capacity planning and incident response to minimize manual intervention.
- Collaborate with engineering and product teams to design and implement highly available, fault-tolerant systems.
- Own and deliver Service Level Objectives, Service Level Indicators and error budgets to enhance system reliability.
- Create and maintain documentation for implemented solutions.
- Mentor engineering teams on SRE principles, DevOps culture and best practices.
- Stay updated on industry trends, evaluating new tools and methodologies to improve system reliability.
- Balance security, performance and flexibility in all decisions.
- Participate in daily stand-ups, planning and other team meetings., * Communication: Slack, Google Meet
- Work management: Jira
- Documentation: Confluence
- Repository: Bitbucket
- Automation & IaC: Bash, Python, Go, Terraform
- Observability: Prometheus, Grafana, Jaeger, Tempo, Loki
- CI/CD: Bitbucket Pipelines, ArgoCD
- Containerization & orchestration: Docker, Kubernetes, Helm
- Security tooling: SOPS, Okta, TFsec, Trivy, Istio
- Stateful: PostgreSQL, TimescaleDB, Redis, Kafka, Elasticsearch
- HTTP: Nginx & Ingress-Nginx
Recruitment process
- Stage 1 - Screening with Talent Acquisition Partner ( 45 min).
- Stage 2 - Technical interview with Senior Developer ( 1 h, system architecture focus).
- Stage 3 - Interview with Lead of DevOps - hands-on Docker/Kubernetes/Linux scenarios ( 1 h).
- Stage 4 - Final interview with Head of Technology (30 min).
- Background check after offer extension.
Requirements
- 5+ years in a DevOps, SRE or similar role, working on a product with long-term platform maintenance.
- 10+ years of experience in technology.
- Independent platform management experience with autonomous decision making.
- Proficiency in at least one programming language (Python, Go, C++ or Java).
- Extensive experience with JVM, Node.js and related application maintenance.
- Advanced Linux administration (Debian/Ubuntu).
- Strong networking knowledge (LAN/WAN, firewall, proxy, load balancers, HTTP(S), DNS, SSH, TCP/IP, REST).
- Hands-on experience with observability tools (Prometheus, Grafana, OpenTelemetry, etc.).
- Knowledge of Kafka, Redis, Nginx and Docker.
- Experience with CI/CD and version control systems.
- Expertise in Kubernetes, Helm and Helm charts.
- Public cloud experience (GCP, AWS or Azure) including redundancy and disaster-recovery design.
- Design, implementation and maintenance of scalable, high-performance infrastructure (HPA, KEDA, affinity rules).
- Proficient in written and spoken English (B2 or higher).
Nice to have
- Financial domain experience.
- Interest in finance, trading and crypto.
- Experience with Argo CD and Argo Rollouts.
- Knowledge of Apache HTTP Server, OpenVPN.
- Logging tools experience (Kibana, FluentD, Elasticsearch, Loki).
- Large PostgreSQL database configuration and optimization.
- SSO and Okta expertise.
- Self-motivated and able to work independently with minimal supervision.
Benefits & conditions
What we offer
- Remote-first company - work from anywhere.
- Flexible working hours with core time 11 am-3 pm CET.
- 38 days paid vacation plus 14 days paid sick leave.
- Autonomy to make decisions and explore new ideas.
- B2B salary of €75,000-90,000 per year plus on-call compensation.