Senior Site Reliability Engineer
Role details
Job location
Tech stack
Job description
We are looking for a Senior Site Reliability Engineer (Senior SRE) to lead reliability initiatives for highimpact services. In this role, you will own the reliability, scalability, and performance of one or more critical systems, lead the design and implementation of automation to eliminate toil and reduce operational risk, drive improvements in observability, incident response, and production readiness across teams and partner closely with product engineering, platform, security, and release management to ship changes safely and quickly.
Senior SREs at Docusign operate as handson technical leaders: they set the reliability bar for their domain, mentor other engineers, and lead crossfunctional projects that materially improve availability and customer experience. Ideally, you have a background in software development, incident management, service catalogs, request tracing systems, time series telemetry platforms, application performance management tools or log management tools. The role requires an on-call rotation every 4 weeks.
This position is an individual contributor role reporting to the Senior Manager, SRE.
Responsibility
- Design, implement, and operate highly available, scalable services in cloud environments (primarily Azure, with some multicloud scenarios)
- Define and evolve SLOs/SLIs, error budgets, and capacity strategies for owned services; use them to guide engineering tradeoffs and release decisions
- Analyze patterns in incidents and outages; own longterm reliability improvements for your domain and contribute to reliability strategy across services
- Write high quality code that is easy to maintain and test
- Ensure design and architecture is extensible across projects, and participate in technical design and code reviews
- Identify operational toil and lead automation efforts to eliminate it-deployment, runbook, and remediation workflows that make incidents rarer and faster to resolve
- Develop robust, welltested tooling and shared libraries that are adopted across multiple teams
- Improve CI/CD pipelines and guardrails to reduce change failure rate while increasing deployment velocity
- Design and implement logging, metrics, tracing, and alerting for complex distributed systems; ensure signals are actionable and aligned to business impact
- Build and automate tools and solutions for incident impact analysis and effective mitigation
- Participate in and often lead incident response for Sev0-Sev2 events: triage, mitigation, coordination, and clear communication
- Perform and contribute to blameless postincident reviews, rootcause analysis, and followthrough on corrective actions
- Work with Operations and Incident Command teams during and post incidents to drive excellence in Incident Management Process
- Compose and analyze dashboard to highlight areas of the business that need attention and help drive organizational KPI
- Create and respond to system generated alerts to maintain system health
- Work with Operations and Engineers to fill any gaps in alerting and telemetry
- Act as the primary SRE partner for one or more engineering teams-shaping architecture, reviewing designs, and embedding reliability best practices
- Mentor and coach other SREs and software engineers on topics such as debugging, observability, incident management, and performance optimization
- Contribute to and help standardize SRE practices, runbooks, and production readiness criteria across CPE and product teams
- Work with Product Management, collaborators and other developers to understand design requirements and provide estimates for development
- Learn and grow in all key technologies in Docusign and be a partner to Eng and Operations teams
Job Designation
Remote: Employee is not required to be in or near an office frequently and works from a designated remote work location for the majority of the time.
Positions at Docusign are assigned a job designation of either In Office, Hybrid or Remote and are specific to the role/job. Preferred job designations are not guaranteed when changing positions within Docusign. Docusign reserves the right to change a position's job designation depending on business needs and as permitted by local law. What you bring
Requirements
We are looking for a self-motivated, driven and creative Senior Site Reliability Engineer to join the Site Reliability team. Metrics and analytics drive engineering at DocuSign and ensure that we are dedicating valuable engineering cycles to the right places. This role is a unique opportunity to impact the entire DocuSign team and drive adoption., * 8+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles with ownership of production systems at scale (or equivalent experience)
- Experience coding in at least one modern language (e.g., Go, Python, C#, Java), with the ability to design, implement, test, and debug productiongrade automation and services
- Practical experience operating largescale services in public cloud (Azure preferred; AWS/GCP acceptable with willingness to learn Azure)
- Experience with Linux, networking fundamentals, and common infrastructure components (load balancers, DNS, certificates, queues, caches, databases)
- Experience with Observability stacks (e.g., Prometheus/Grafana, OpenTelemetry/Chronicle, centralized logging)
- Experience with CI/CD systems and deployment strategies (blue/green, canary, rolling updates)
- Experience with incident management and oncall operations for 24x7 services
- Experience in building dashboards and metrics analysis
Preferred
- Strong analytical and problem-solving skills
- Experience in highavailability, regulated, or customerfacing SaaS environments
- Background in reliability practices such as chaos testing, capacity modeling, and performance tuning
- Exposure to release management/unified release practices and safe rollout strategies (feature flags, staged rollouts, configurationdriven changes)
- Demonstrated leadership driving crossteam initiatives: reliability programs, migrations, or major refactors
- Strong written and verbal communication skills; ability to explain complex technical topics to both engineers and nontechnical stakeholders
Benefits & conditions
California: $157,500.00 - $254,350.00 base salary
Illinois, Colorado, Massachusetts and Minnesota: $151,200.00 - $213,600.00 base salary
Washington, Maryland, New Jersey and New York (including NYC metro area): $151,200.00 - $222,450.00 base salary
Washington DC: $157,500.00 - $222,450.00 base salary
Ohio: $131,900.00 - $186,275.00 base salary
This role is also eligible for the following:
- Bonus: Sales personnel are eligible for variable incentive pay dependent on their achievement of pre-established sales goals. Non-Sales roles are eligible for a company bonus plan, which is calculated as a percentage of eligible wages and dependent on company performance.
- Stock: This role is eligible to receive Restricted Stock Units (RSUs).
Global benefits provide options for the following:
- Paid Time Off: earned time off, as well as paid company holidays based on region
- Paid Parental Leave: take up to six months off with your child after birth, adoption or foster care placement
- Full Health Benefits Plans: options for 100% employer paid and minimum employee contribution health plans from day one of employment
- Retirement Plans: select retirement and pension programs with potential for employer contributions
- Learning and Development: options for coaching, online courses and education reimbursements
- Compassionate Care Leave: paid time off following the loss of a loved one and other life-changing events
Life at Docusign
Working here
Docusign is committed to building trust and making the world more agreeable for our employees, customers and the communities in which we live and work. You can count on us to listen, be honest, and try our best to do what's right, every day. At Docusign, everything is equal.
We each have a responsibility to ensure every team member has an equal opportunity to succeed, to be heard, to exchange ideas openly, to build lasting relationships, and to do the work of their life. Best of all, you will be able to feel deep pride in the work you do, because your contribution helps us make the world better than we found it. And for that, you'll be loved by us, our customers, and the world in which we live.