Site Reliability Engineer (SRE)
Role details
Job location
Tech stack
Job description
- Mentor and evangelize on observability best practices, SLIs/SLOs, and reliability culture across engineering teams.
- Contributing to and maintaining Tulip's triage & remediation processes for both Humans and AI
- Help architect our systems for growth and scale.
- Implement internal tools to automate common developer tasks.
- Perform incident response and debug production issues across the entire stack.
- Design, build, and maintain the core infrastructure used by all of Tulip's engineering teams.
- Work to automate detection and resolution of recurring issues.
Requirements
- You have experience building and maintaining stable infrastructure at scale.
- You can reason about systems - their edge cases, failure modes, and life cycles.
- You're excited about setting the technical agenda and coming up with novel, broad ideas.
- You regularly keep up with the newest AI advancements in the realm of Observability & Monitoring, and experiment with emergent ways of work.
- You can debug complex issues across the entire stack.
- You're opinionated about the tools and frameworks that work best.
- You enjoy building for other engineers equally, if not more, than building for a customer.
- You know what a good SLA looks like, and can teach others how to spot one.
What skills do I need?
- You have 5+ years of experience working with open source Observability tools (e.g. LGTM stack)
- You have hands-on experience instrumenting distributed systems using OpenTelemetry and managing metrics pipelines with Prometheus at scale.
- You have hands on experience developing and distributing Claude Skills, Gemini Gems, or any other generic AI processes and are able to iterate on their efficacy
- You have experience working with time-series data, ideally using promQL
- You can pick up new languages/frameworks with ease. We currently run Go and Typescript services on Kubernetes.
- You can communicate as well as you can code. You understand the value of discussion and work best in a team that champions clear and frequent communication.
Benefits & conditions
We're building a strong, diverse team that values hard work, families, and personal well-being. Benefits of working with us include:
- Direct impact on product and culture
- Company equity
- Competitive benefits package including Health, Dental, Vision, Short-term Disability, Long-term Disability, Life Insurance, AD&D Insurance, Flexible Spending Account (FSA), Commuter Benefits, Parental Leave, and 401(K)
- Flexible work schedule and unlimited vacation policy
- Virtual company events and happy hours
- Fitness subsidies
We are an equal opportunity employer. At Tulip, we celebrate all. Qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Help us build an inclusive community that will transform frontline operations.
The compensation information displayed on each job posting reflects the range for new hire pay rates for the position across all US locations. Within the range posted, actual compensation will be determined depending on multiple factors including job-related knowledge & skills, experience, business needs, geographical location, market compensation data, and internal equity. Expected compensation ranges for this role may change over time. The salary range for this position is $150,000 - $190,000 per year.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Please note that we may use AI-based tools to support parts of our hiring process. All data processing is carried out in compliance with local data protection laws, ensuring all personal candidate information is handled securely and ethically.