Site Reliability Engineer

Axiom
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote

Tech stack

Amazon Web Services (AWS)
Cloud Computing
Linux
Disaster Recovery
Github
Reliability Engineering
Software Engineering
CircleCI
Pulumi
Large Language Models
Reliability of Systems
Backend
Gitlab
Kubernetes
Data Analytics
Terraform
Serverless Computing
Docker
Go

Job description

Axiom's mission is to empower developers to get the best insights into their data, as fast as possible. We are a remote-first and globally distributed team building a cloud native, serverless data analytics platform. Axiom completely changes the way in which developers and organizations think about their data: they can now send unlimited data with cost-effective storage and lightning-fast querying.

As a Site Reliability Engineer at Axiom, you will be pivotal in upholding our promise of superior reliability and performance to our customers. Collaborating with backend engineers and product teams, you will emphasize creating and operating scalable and reliable systems. Axiom's emphasis on SREs revolves around automating, measuring, and continuously improving the reliability and efficiency of our systems.

Your primary responsibilities:

  • Engineer and maintain a robust, secure, and scalable infrastructure for Axiom Cloud.
  • Collaborate with engineering teams to define and refine service level objectives.
  • Contribute to disaster recovery planning, capacity engineering, performance analysis, and system tuning.
  • Foster best practices for code deployments, aiding in the education of the broader development team.
  • Roll out tooling and solutions that improve system reliability and reduce manual toil.
  • Address and remediate service incidents and contribute to postmortems and root cause analyses.
  • Foster a culture of monitoring, alerting, and observability across the organization., * Flexibility to work from wherever suits you best. For this role, we are considering individuals based in the timezone range UTC-5 (EST) to UTC +2.
  • Budget to build your home office set-up.
  • Monthly budget to support mental and physical wellness.
  • A focus day each week with no meetings, Slack or Zoom. Uninterrupted time to focus on work.
  • Uncapped vacation to unplug and rejuvenate.

Requirements

Do you have experience in Terraform?, * You have over two years of experience in a reliability-focused engineering environment.

  • You are passionate about system reliability, latency, performance, and efficiency.
  • You're familiar with AWS tools and technologies.
  • You have hands-on experience with Docker, Kubernetes, and Amazon EKS.
  • You understand infrastructure-as-code tools such as Terraform/Pulumi.
  • You possess strong networking knowledge and are adept with Linux systems.
  • Familiarity with CI platforms like GitHub Actions, GitLab, CircleCI or others.
  • You can efficiently use LLMs.
  • Experience with monitoring, alerting, and observability tools.

Bonus skills and experiences:

  • Proven track record of maintaining production systems at scale.
  • A software engineering background with expertise in Golang.

Apply for this position