Senior Site Reliability Engineer, Platform Responsibility - USDS
Role details
Job location
Tech stack
Job description
administration and operational efficiency, leveraging AI-assisted development tools to accelerate delivery and code quality - Participate in regular on-call rotations as part of a team that provides 24 hour coverage across multiple shifts - Engage in and improve the whole lifecycle of services from inception and design, development, capacity planning, and launch reviews, to deployment, operation, and refinement - Practice sustainable user support, incident response, and post mortem - Drive the long-term roadmap for system administration tools. Build internal platforms that leverage AI-assisted development to eliminate toil and improve engineering velocity. - Serve as a primary Incident Commander for high-severity issues, leading cross-functional teams and ensuring technical resolution aligns with business priorities. - Partner with Product and Development teams from the design phase to ensure observability, scalability, and disaster recovery are core components of every new feature.
- Foster a culture of sustainable operations by mentoring junior engineers and evangelizing SRE principles throughout the organization.
Requirements
Do you have experience in UNIX?, Do you have a Bachelor's degree?, Minimum Qualifications: - Bachelor's degree or above in Computer Science or a related technical discipline. - At least 5 years of professional experience in SRE, DevOps, or Infrastructure Engineering. - Proven experience integrating AI/LLM APIs into internal workflows, specifically for log analysis, alert contextualization, or diagnostic assistance. - Deep understanding of Unix/Linux system internals, networking fundamentals, and distributed systems architecture. - Expertise in designing and scaling observability stacks using tools such as Prometheus, Grafana, or DataDog. - Demonstrated ability to troubleshoot complex, non-obvious production issues across the entire stack. Preferred Qualifications: - Experience building agentic workflows and orchestration frameworks to assist in incident triaging and runbook matching. - Mastery of AI-assisted development tools to accelerate infrastructure-as-code delivery and automate documentation. - Deep technical proficiency in container orchestration via Kubernetes and managing big data technologies such as Kafka.
Benefits & conditions
(part of ByteDance) 3.33.3 out of 5 stars Seattle, WA Hybrid work $177,688 - $341,734 a year, Pulled from the full job description
- Paid parental leave
- Parental leave
- Health insurance
- 401(k) matching
- Vision insurance
- Dental insurance
- Paid sick time, The base salary range for this position in the selected city is $177688 - $341734 annually.
Compensation may vary outside of this range depending on a number of factors, including a candidate's qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units.
Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure).
The Company reserves the right to modify or change these benefits programs at any time, with or without notice.
For Los Angeles County (unincorporated) Candidates:
Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment:
-
Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues;
-
Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and
-
Exercising sound judgment. About USDS