Site Reliability Engineer
Role details
Job location
Tech stack
Job description
You are an engineer, a builder, and a systems thinker. You'll create middleware and platform guardrails that empower developers to innovate quickly and reliably. You combine deep technical judgment with empathy to eliminate customer pain, especially when working with enthusiastic teams stewarding the world's most privileged data.
You uplift those around you, act as a subject matter expert, mentor others, and drive change. You chase contributing factors over root causes, value code over documentation, and documentation over process. You'll engage in and often lead architectural discussions, reduce toil, and deliver scalable, resilient platforms that support our customers and organization.
As a Senior SRE, you'll help scale our cloud platform, collaborate across teams to promote standardization and resiliency, and participate in on-call rotations. You'll be a key voice in observability, change management, and service scalability, providing guidance during complex technical decisions and high impact events.
iManage is experiencing explosive growth in its flagship cloud product. We're seeking senior software and systems engineers specializing in reliability and platform services to join our transformative cloud journey. This requires rethinking technical decisions with a beginner's mindset and a focus on resilience and sustainability. If you write code, think in systems, embrace complexity and automation, and are passionate about service resilience and scalability - we want to talk to you.
iM Responsible For…
- Eliminating TOIL through automation and software development.
- Partnering cross-functionally with application teams and internal stakeholders.
- Creating a modern, cloud-native platform that is resilient, cost-effective, and secure by default.
- Scaling cloud infrastructure to support our Kubernetes-based ecosystem.
- Maintaining the freshness and utility of platform services.
- Improving the security posture of our products.
- Designing automation, orchestration, observability, and disaster readiness into our products.
- Participating in production support and on-call rotations, providing senior-level guidance during critical events.
- Leading incident management and post-incident retrospectives, and coaching teams in these practices., * Join a rapidly evolving, industry-leading SaaS company on an exciting journey of growth and scalability!
- Take on meaningful, high-impact challenges by leveraging cutting-edge technologies and best-in-class protocols to drive innovation.
- Own my career path with our internal development framework. Ask us more about this!
- Expand my skill set and earn certifications with unlimited access to LinkedIn Learning courses and interactive Microsoft courses & training.
- Be part of a supportive and experienced team within a dynamic, inclusive, and encouraging culture.
- Enjoy flexible work hours that empower me to balance personal time with professional commitments.
- Collaborate in a modern, open-plan workspace featuring a gaming area, free snacks and drinks, and regular social events.
iManage Is Supporting Me By...
- Creating an inclusive environment where you're encouraged to help shape the culture by bringing your unique perspective, not just by fitting in.
- Providing a market leading salary determined through a fair and consistent process, equitable for all our employees, and regularly reviewed against industry benchmarks.
- Rewarding me with an annual performance-based bonus.
- Providing enhanced parental leave (20 weeks for primary and 10 weeks for secondary caregiver at 100% pay)
- Matching my pension contribution (up to 6%)
- Offering BUPA private medical insurance & a Simplyhealth cash plan to assist with the everyday costs.
- Providing Group life cover, including life insurance, income protection, and critical illness protection.
- Encouraging me to make use of our top-tier flexible time off policy, which includes 25 days of annual leave and the flexibility to take further additional time off as needed
- Having multiple company wellness days each year to prioritize mental health and well-being.
- Providing access to RethinkCare, a global behavioral health platform that enhances personal well-being, strengthens professional resilience, and empowers parental success through expert-led training and resources.
Requirements
- Experience writing design documents, postmortems, and refactoring application code.
- Built automation to reduce operational burden or developed internal SaaS tools.
- Ability to advocate for SRE principles (e.g., SLOs vs SLAs) and introduce them effectively.
- Experience in public cloud or hosted datacenter environments (Azure and AKS preferred).
- A passion for collaborative teamwork and influencing reliability best practices across teams.
Bonus Points If I Have...
- Hands-on experience with Linux server stacks (Ubuntu/Debian preferred).
- Knowledge of cloud provisioning platforms (Terraform preferred).
- Exposure to configuration management tools (Chef preferred).
- Experience with containerization/clustering technologies (Docker preferred).
- Familiarity with observability and alerting tools (Prometheus/Grafana or ELK/EFK).
- Practical experience with CI/CD pipelines and rollout strategies.
- A bachelor's degree (or equivalent experience) in Computer Engineering or related field.
- Proficiency in one or more programming languages (e.g., Java, Python, Golang).
- Familiarity with scripting languages (e.g., PowerShell, Bash, Python, Ruby).