Director, Site Reliability Engineer | Senior Engineering Team Director
Role details
Job location
Tech stack
Job description
We're seeking a Site Reliability Engineering (SRE) Lead to design, build, and maintain resilient, high-scale systems supporting BlackRock's Private Markets platform. In this hands-on leadership role, you'll apply deep engineering expertise to solve complex challenges, guide a global team, shape technical direction, and communicate effectively with senior stakeholders-ensuring the reliability of mission-critical systems that power private market investment workflows and decision-making. You will drive the adoption of AI-driven solutions to accelerate incident detection and triage, reduce toil, improve forecasting and capacity planning, and strengthen end-to-end observability and resilience.
Role Responsibilit ies
- Take ownership of project priorities, deadlines and deliverables using Agile methodologies, with clear outcomes around reliability automation and AI-enabled operations
- Understand and refine business and functional requirements, translating them into SLOs/SLIs and AI-assisted observability and support capabilities
- Hands on approach to getting work done-this role requires a "roll your sleeves up" mentality, including building and operationalizing reliability tooling and automation that measurably reduces toil and improves stability
- Be a leader with vision and a partner in brainstorming solutions for team productivity and efficiencyto improve engineering effectiveness
- Drive priority setting of the engineering teams, balancing foundational reliability work with delivery of new product features
- Improve Engineering culture by encouraging continuous focus on reliability across the entire application lifecycle, and by adopting AI-enabled SRE practices (e.g., intelligent alerting, automated diagnosis, and self-healing whereappropriate)
- Proactive participant in architectural and design decisions, including AI-ready telemetry, data quality, and model integration patterns for operational analytics
- Design and implement end-to-end monitoring solutions for application and infrastructure components,leveragingmodern observability platforms plus AI/ML techniques for anomaly detection, correlation, and alert noise reduction
- Drive the engineering of capacity management and demand forecasting solutions, including predictive analytics/ML approaches where they add measurable value
- Act as aculture carrierandleader, passing onSREknowledgeand best practices to theengineeringteam
- Drive detailed root cause investigations for production incidents with rigorous focus on issue avoidance, using AI-assisted correlation/analysis to accelerate time-to-insight
- Create/coordinate retros for significant incidents, ensuring learnings are captured in automated/AI-assisted runbooks and embedded into prevention mechanisms
- Additionalcore engineering functions, such as adding custom telemetry metrics/logs/traces to the code base of in-scope applications to enable AI/ML-driven operational insights
- Anticipate new opportunities to continuously evolve the resiliency profile of scoped applications and infrastructure
Requirements
Must Have
- B.S. / M.S. degree in Computer Science,Engineeringor a related discipline with 10+ years of experience
- Experience leading high performing engineering/SRE teams, witha track recordof driving continuous improvement through automation and AI-enabled operations
- Demonstrated ability torepresentengineering/SRE priorities, status, and risk to senior leadership stakeholders with clear, executive-ready communication
- Hands-on experience building or operating AI-assisted capabilities (AIOps, ML-based anomaly detection, or GenAI workflows) in an engineering/production environment
- A passion for providing engineering support forhighly available, performant full stack applications with a "Student of Technology" attitude
- Experience with relationaldatabaseand NoSQL Database (e.g.Redis,Apache Cassandra)
Benefits & conditions
To help you stay energized, engaged and inspired, we offer a wide range of employee benefits including: retirement investment and tools designed to help you in building a sound financial future; access to education reimbursement; comprehensive resources to support your physical health and emotional well-being; family support programs; and Flexible Time Off (FTO) so you can relax, recharge and be there for the people you care about.