Lead Site Reliability Engineer | Copperleaf
Role details
Job location
Tech stack
Job description
Our Cloud Operations Team, a crucial component of our Software as a Service (SaaS) offering, also delivers Infrastructure as a Service (IaaS) to IFS Copperleaf. Built on the foundation of Site Reliability Engineering, we are expanding. Our commitment is to the reliability and uptime of our services, and we consistently aim to automate processes and minimize manual labor. We are currently seeking a mid senior level cloud engineer to contribute to these services and assist in enhancing the operational aspects of each service.
As a Lead Site Reliability Engineer (SRE) specializing in Azure, you will play a pivotal role in architecting, operating, and optimizing our cloud infrastructure. You will lead initiatives to ensure the reliability, scalability, and security of our Azure-based SaaS offerings. You'll mentor junior engineers, drive automation, and partner with development teams to deliver robust, high-availability solutions.
Key Responsibilities
- Lead the design, implementation, and continuous improvement of Azure-based infrastructure for high-availability, mission-critical SaaS services.
- Architect and automate deployment pipelines using Azure DevOps, ARM/Bicep, Terraform, and related tools.
- Own and enhance monitoring, alerting, and incident response for Azure resources (App Services, AKS, SQL, Storage, Networking, etc.).
- Drive root cause analysis and resolution of complex production incidents, collaborating across teams.
- Define and enforce SLOs, SLIs, and SLAs for Azure-hosted SaaS services.
- Champion security best practices, including identity, access, secrets, and certificate management in Azure.
- Mentor and coach junior SREs and CloudOps engineers.
- Partner with development teams to embed reliability and operational excellence into the SDLC.
- Evaluate and implement new Azure features and services to improve reliability, performance, and cost efficiency.
- Document architecture, runbooks, and operational procedures for Azure environments.
Requirements
Do you have experience in Terraform?, * 5+ years' experience in SRE, Cloud Operations, or DevOps roles, with at least 3 years focused on Microsoft Azure.
- Deep expertise in Azure services (App Services, AKS, Azure SQL, Storage, Networking, Security Center, Monitor, etc.).
- Strong automation and scripting skills (PowerShell, Python, Bash, or similar).
- Proven experience with Infrastructure as Code (Terraform, ARM/Bicep).
- Advanced troubleshooting of distributed systems, networking, and application performance in Azure.
- Solid understanding of microservices, container orchestration (Kubernetes/AKS), and CI/CD pipelines.
- Experience with monitoring, logging, and observability tools (Azure Monitor, Log Analytics, Application Insights).
- Strong grasp of security protocols, certificate and secret management, and compliance in Azure.
- Demonstrated ability to lead incident response and post-mortem analysis.
- Excellent communication skills and a passion for mentoring others., * Azure certifications (e.g., Azure Solutions Architect, Azure DevOps Engineer).
- Experience with hybrid or multi-cloud environments, including AWS.
- Familiarity with cost management and optimization in Azure.
- Experience supporting large-scale SaaS platforms.
Benefits & conditions
We embrace flexibility and hybrid work opportunities to support diverse needs and lifestyles, while also valuing inclusive workplace experiences. By fostering a sense of community, we drive innovation, strengthen connections, and nurture belonging. Our commitment ensures you can work in a way that suits you best, while also engaging with colleagues to share ideas and build meaningful relationships.