Senior Site Reliability Engineer - Growth
Role details
Job location
Tech stack
Job description
Our Growth business unit is responsible for building and scaling the experiences that drive Kraken's user base-covering Onboarding, Acquire, and Engage teams. As part of this team, you will help ensure the reliability, scalability, and performance of the systems that power our growth initiatives.
As a Senior Site Reliability Engineer, you will partner with development teams to manage infrastructure, improve CI/CD pipelines, and support operational excellence across Growth. You will bring your expertise in infrastructure, monitoring, and automation to ensure our services are performant, resilient, and continuously improving.
- Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
- Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
- Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
- Participate in incident response and post-incident reviews to improve system resilience
- Consult with teams on performance, monitoring, and alerting best practices
- Build tooling, automation, and dashboards to improve observability and empower development teams
- Collaborate with developers, QA, and product managers to streamline development and release cycles
- Support a fully distributed team operating across multiple timezones
Requirements
-
Strong experience managing infrastructure with Consul, Vault, and Terraform
-
Proficiency with databases (SQL and NoSQL) and experience operating them in production
-
Proficient in Git source version-control and CI/CD configuration.
-
Deep understanding of monitoring and alerting systems, preferably Prometheus and Grafana
-
Ability to debug complex issues involving distributed systems, networks, and Linux operating systems
-
Experience with containerization and orchestration (Docker, Nomad, Kubernetes a plus)
-
Strong scripting skills (e.g., Bash, Python, or Go)
-
Self-starter with the ability to thrive while working independently and remotely in a fast-paced environment
-
Ability to collaborate effectively with multiple teams and switch context across projects
-
Interest in security and consideration of the security implications of development and operational decisions
-
Experience with benchmarking, performance tuning, and identifying system bottlenecks
-
Familiarity with incident management best practices and tooling
-
Interest in lower-level programming languages such as Rust
-
Experience integrating with APIs (GitLab, Jira, Slack)
-
Background working with distributed systems and technologies (Kafka, gRPC, Redis, etc.)
-
Passion for building reliable, user-facing systems that scale.