Site Reliability Engineer

OfficeSpace Software Inc.
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Agile Methodologies
Artificial Intelligence
Apache HTTP Server
Configuration Management
Databases
Continuous Integration
Linux
Python
PostgreSQL
Load Testing
MariaDB
MySQL
Nginx
Performance Tuning
Redis
Reliability Engineering
Ansible
Prometheus
Ruby
Datadog
Grafana
Database Performance
Indexer
Kubernetes
Infrastructure Automation Frameworks
Cloud Optimization
Puppet
Terraform

Job description

As a Senior Site Reliability Engineer, you'll enhance system performance and reliability, optimize databases, and implement AI-assisted solutions for operational efficiency., You own the performance, reliability, and cost efficiency of OfficeSpace's production platform at scale. As a Senior Site Reliability Engineer, you shape how our systems run-fast, resilient, and predictable-while leading the shift from manual operations to AI-assisted reliability engineering. We provide the platform. You make it perform.

What You'll Do:

  • Drive measurable improvements in latency, throughput, and availability across a large-scale production environment.

  • Own system performance-from Linux internals to Kubernetes scheduling-and eliminate bottlenecks before customers feel them.

  • Define and enforce SLIs, SLOs, and error budgets that balance speed, reliability, and growth.

  • Partner with application engineers to profile code paths, improve execution efficiency, and harden services under real load.

  • Lead database performance optimization across queries, indexing, replication, and workload isolation.

  • Design and oversee AI-assisted load testing, stress testing, and capacity planning workflows.

  • Guide the migration from monolithic deployments to multi-tenant Kubernetes platforms.

  • Reduce infrastructure spend through architectural decisions, right-sizing, and intelligent scaling strategies.

  • Build and supervise automation for infrastructure provisioning, configuration management, and observability.

  • Set clear operational standards for reliability, performance, and incident response-and raise the bar for how we run production., * High-Performance Culture: At OfficeSpace, we believe in the power of accountability, focus, and drive. Our A-Player team members work together to deliver measurable, meaningful results. We recognize and reward those who push boundaries and achieve excellence.

  • Ownership and Accountability: We trust our employees to take full ownership of their roles, providing the autonomy to innovate and the support to succeed. We seek individuals who are self-motivated and thrive in an environment where they can drive impactful outcomes.

  • Technology-Forward: As a company invested in cutting-edge technology, we integrate AI and other advanced solutions across our platform to enhance productivity, customer experience, and process efficiency. Our team members are excited by the potential of AI and proactively explore ways it can drive our success.

  • Growth Mindset: Continuous learning and improvement are integral to our culture. We encourage our team to embrace challenges, seek knowledge, and develop both personally and professionally.

  • Innovation and Agility: We foster a dynamic, fast-paced environment where fresh ideas and bold solutions are celebrated. We embrace change and thrive on turning challenges into opportunities, with a team that is agile, proactive, and resilient.

  • Collaborative, Results-Driven Environment: We value purposeful collaboration that leads to shared success and stronger results. While our team members are independent, they recognize the value of working together to drive our mission forward.

  • Competitive Benefits and Rewards: OfficeSpace offers comprehensive and competitive benefits packages globally, designed to support our team's health, well-being, and financial security. We invest in our people so they can excel. OfficeSpace is committed to building and promoting a diverse workforce and celebrates the unique qualities that individuals of various backgrounds and experiences offer. We are committed to basing all employment-related decisions upon valid job-related factors without regard to race, color, sex (including pregnancy, sexual orientation, and gender identity), age, religion, national origin, genetic information, military status, veteran status, physical or mental disability, or any other status protected by law.

Requirements

  • 7+ years operating and evolving large-scale production systems. Deep Linux systems expertise with hands-on performance tuning across CPU, memory, disk, and networking.

  • Strong Python skills for automation, tooling, and AI-assisted systems workflows.

  • Production experience with Ruby/Rails ecosystems, including Puma and Sidekiq.

  • Proven ability to diagnose and resolve complex database performance issues (MySQL/MariaDB or PostgreSQL).

  • Advanced Kubernetes experience-workload sizing, scheduling, and multi-tenant operations.

  • Infrastructure-as-code mastery using Terraform and Terragrunt.

  • Experience with configuration management tools such as Puppet or Ansible.

  • Strong observability instincts across metrics, logs, and traces using tools like Prometheus, Grafana, Datadog, or ELK.

  • AI fluency-comfortable supervising AI agents for analysis, testing, and reporting, and validating their outputs.

  • A builder mindset. You move fast, take ownership, and raise standards.

Preferred Background:

  • Scaling and refactoring monolithic applications under real production load

  • Extracting databases or stateful components from monoliths

  • Apache and Nginx tuning at scale

  • Redis performance optimization and operational management

  • CI/CD systems and GitOps workflows, including ArgoCD

  • Cloud cost optimization and FinOps-aligned operational practices

Apply for this position