Site Reliability Engineer

Jobgether
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Amazon Web Services (AWS)
Cloud Computing
Data Infrastructure
DevOps
Fault Tolerance
PostgreSQL
MySQL
Performance Tuning
Redis
Reliability Engineering
Data Logging
System Availability
Delivery Pipeline
Grafana
Reliability of Systems
Kubernetes
Vertica

Job description

This role offers the opportunity to play a critical part in scaling and maintaining a high-growth platform used by a global audience. You will be responsible for ensuring system reliability, performance, and security as infrastructure demands continue to expand. Working in a fully remote and highly collaborative environment, you will partner closely with engineering teams to build resilient, scalable systems. This is a hands-on position suited for someone who thrives in fast-paced environments and enjoys solving complex operational challenges. You'll have a direct impact on uptime, system health, and long-term infrastructure strategy while contributing to automation and continuous improvement initiatives. Accountabilities:

  • Act as a primary responder for incidents and outages, ensuring high availability and rapid resolution of production issues.
  • Own and continuously improve monitoring, alerting, and logging systems to enhance observability and system health.
  • Manage and optimize database infrastructure, including MySQL, PostgreSQL, ClickHouse, and Redis.
  • Maintain and enhance server infrastructure and deployment pipelines for improved efficiency and reliability.
  • Collaborate with engineering teams to design and implement scalable, fault-tolerant systems.
  • Contribute to the development of internal SRE tools and automation to streamline operations.

Requirements

  • 3+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles.
  • Strong expertise in AWS and Kubernetes, with hands-on experience managing cloud-native systems.
  • Proven experience handling incident response and maintaining production-grade systems.
  • Solid background in database operations, performance tuning, and optimization.
  • Familiarity with observability tools, monitoring frameworks, and logging best practices.
  • Strong communication skills and ability to work effectively in a remote, asynchronous environment.
  • Fluent English proficiency (written and spoken).
  • Bonus: Experience with SOC2 compliance, scaling high-growth platforms, or working with ClickHouse or similar technologies.

Benefits & conditions

  • Competitive salary with equity and annual compensation reviews
  • Fully remote work environment with flexible working conditions
  • Generous paid time off (35 days annually) and sabbatical opportunities
  • Comprehensive healthcare coverage or reimbursement options
  • Parental leave to support family growth
  • Home office stipend for optimal remote setup
  • Learning and development budget for continuous skill enhancement
  • Performance-based bonus opportunities
  • Company-sponsored global retreats and team offsites

Apply for this position