{"@context":"https://schema.org","@graph":[{"@context":"https://schema.org/","@type":"JobPosting","@id":"#jobPosting","title":"Lead Site Reliability Engineer
Role details
Job location
Tech stack
Job description
Principal Site Reliability Engineers value reliability above all else. Every solution is methodically engineered to ensure maximum availability and reliability. Principal SREs lead the most complex investigations and build out the most robust and fault tolerant solutions.
Join the pioneering Site Reliability Engineering team at Dayforce, where we lead the charge in ensuring our state-of-the-art products set new benchmarks in scalability, availability, and reliability. We embrace the SRE engagement model to deliver exceptional performance. As a member of our team, you\'ll help build and maintain a suite of internal tools that proactively alert, report, and autonomously remediate Dayforce\'s environments, ensuring seamless service. We\'re committed to proactive solutions, crafting robust processes, empowering our talented developers with the latest technology, and engineering innovative remedies to prevent issues from recurring. If you\'re passionate about pushing the boundaries in site reliability engineering, join us at Dayforce and become part of a team that thrives on challenges and celebrates continuous improvement. Elevate your career with us and be a part of a transformative journey in the world of SRE.
Please note for the first 2 months of this role you will be working North American hours, after this initial 2 months you will move to UK office hours.
What You'll Get To Do
- Learn about Dayforce's cloud infrastructure and the applications that run on them to build a full mental model of how the Dayforce ecosystem works.
- Build out end-to-end solutions to that provide enhanced observability or automatically remediate the most complex issues.
- Seek out, propose and execute on projects to improve Dayforce's reliability, SRE processes and reduce day-to-day toil.
- Lead the most complex incidents, publish root cause documents, and ensure all remediation actions are robust and automatic to avoid duplicate incidents.
- Provide oversight to development teams to ensure that features are built with reliability in mind.
- Contribute to the inner source SRE repository.
- Develop trusted relationships with all parts of Dayforce's business.
- PagerDuty On-Call rotations as required., Dayforce encourages personal and professional growth. We offer excellent time away from work programs, comprehensive wellness initiatives and recognition through competitive pay and benefits. With a commitment to community impact, including volunteer days and our charity, Dayforce Cares we provide opportunities for you to thrive both in your career and personal life. Our focus is not just on your job but on supporting you to be the best version of yourself.
Requirements
- Proven ability to lead a group of technical engineers and / or technical projects.
- Experience operating highly distributed systems.
- Experience with Incident Management best practices.
- Ability to identify and resolve performance bottlenecks.
- Experience with Cloud Platforms (Azure Preferred).
- Experience implementing APM and Observability platforms and integrating them into day-to-day operations.
- Experience with at least one object-oriented programing language (C# and Java preferred).
- Experience with at least one scripting language (Python and PowerShell preferred).
- Experience with at least one database engine and querying language (MSSQL / TSQL and Postgres / PLSQL preferred).
- Excellent communication and collaboration skills are mandatory.
- Ability to gather requirements from multiple stakeholders and execute on those defined requirements.
- Proven ability to consistently deliver solutions on time.
- Ability to handle high stress situations.
- Knowledge of containerization and Kubernetes is considered an asset.
- Knowledge of Terraform and other IaC principles is considered an asset.