Site Reliability Engineer
Role details
Job location
Tech stack
Job description
As our SRE Engineer, you will provide: * Site Reliability Engineering and Operational Readiness:
-
Define, implement, and support SRE practices that improve reliability, availability, performance, and operational maturity.
-
Assess applications, platforms, and infrastructure services to establish operational requirements, support standards, and readiness for production within an SRE model. * SLO, Monitoring, and Service Health Management:
-
Develop and maintain service level objectives, service level indicators, alerting standards, and operational metrics aligned to business and technology needs.
-
Monitor service performance and reliability trends across cloud and hybrid environments, and partner with teams to reduce toil and strengthen service health. * Azure Cloud Platform Reliability and Support:
-
Support and improve Microsoft Azure environments, with focus on reliability, scalability, operational supportability, and platform resilience.
-
Contribute to cloud reliability through automation, observability, incident reduction, capacity planning, and operational controls. * Hybrid Infrastructure and Enterprise Platform Support:
-
Support hybrid infrastructure components that underpin application and service reliability, including VMware, Windows Server, Active Directory, F5 load balancers, and Cisco based infrastructure.
-
Identify and address reliability, performance, and support risks across both cloud and on premises environments, including key infrastructure dependencies and traffic flow considerations. * Project Readiness and Reliability Engineering:
-
Review projects, applications, and platform changes to identify what is required for successful support in an SRE environment.
-
Evaluate architecture, dependencies, monitoring, alerting, failover, deployment methods, recoverability, and support processes before production release. * Incident Response and Problem Resolution:
-
Lead or support troubleshooting of complex production issues, service degradations, and reliability concerns across cloud and hybrid platforms.
-
Participate in incident response, root cause analysis, and post incident reviews, and drive follow up actions that improve resilience and reduce repeat issues. * Automation, Observability, and Continuous Improvement:
-
Improve operational efficiency through automation, instrumentation, and standardization of support processes, using tools such as Shell, PowerShell, Python, Terraform, or similar technologies.
-
Support observability and APM platforms such as New Relic, Datadog, ELK Stack, Splunk, Dynatrace, and related monitoring, logging, dashboarding, and alerting solutions. * Cross Functional Communication and Technical Leadership:
-
Work closely with application teams, architects, DevOps engineers, infrastructure teams, and project stakeholders to align reliability expectations and support models.
-
Translate complex technical issues into clear operational guidance, readiness requirements, documentation, and status reporting for both technical teams and leadership.
These duties must be performed with or without reasonable accommodation. We know experience comes in many forms and that many skills are transferable. If your experience is close to what we're looking for, consider applying. Diversity has made us the entrepreneurial and innovative company that we are today., Job Details: Job Description: About Intel Foundry At Intel Foundry, our Technology Development and Manufacturing teams power a worldwide network of cutting edge fabs and asse…
- 10 days ago, Job Details: Job Description: The OR Site Quality Program manager will be a senior member in the Foundry Quality and Reliability Team. Primary responsibilities will be measurin…
- 1 day ago
Requirements
We are in search of a seasoned Site Reliability Engineer. Our ideal candidate brings at least 10 years of hands-on experience across cloud and enterprise infrastructure, with strong expertise in site reliability engineering, operational readiness, observability, and platform support. The selected candidate will have deep experience in Microsoft Azure, along with strong working knowledge of hybrid infrastructure components such as VMware, Windows Server administration, Active Directory, F5 load balancers, and Cisco based environments. The ideal candidate will be highly technical, capable of evaluating services and projects through an SRE lens, and able to define what is required to make applications and platforms operationally ready, resilient, and supportable in production. This role also requires excellent communication skills to effectively partner with engineering, architecture, infrastructure, and leadership teams., * A bachelor's degree or equivalent work experience with a minimum of 10 years' experience in SRE, infrastructure engineering, systems engineering, or a related role supporting both cloud and on premises environments.
- Strong hands on experience with Microsoft Azure is required. AWS experience is welcome, but Azure is a core requirement for this role. Azure certifications such as AZ-104, AZ-305, AZ-400, or AZ-500 are desirable.
- Proficiency with hybrid infrastructure technologies including VMware, Windows Server administration, Active Directory, DNS, load balancers such as F5, and Cisco based enterprise infrastructure.
- Experience supporting Azure cloud services and platform components related to compute, networking, identity, monitoring, security, and operational support.
- Strong experience with observability, monitoring, and APM platforms such as New Relic, Datadog, ELK Stack, Splunk, Dynatrace, or similar tools.
- Experience defining and supporting SLOs, SLIs, alerting standards, dashboards, incident response, root cause analysis, and service health metrics.
- Strong automation and scripting experience using tools such as Terraform, ARM, Bicep, Ansible, Shell, PowerShell, Python, or similar technologies used to improve reliability and operational efficiency.
- Knowledge of PCI or similar security posture, audit procedures, and CIS security hardening standards
Benefits & conditions
- Medical, Dental and Vision insurance for you and your family
- Relax and recharge with Paid Time Off (PTO)
- 6 company-observed paid holidays, plus 3 paid floating holidays
- 401k (after 90 days) plus employer match up to 4%
- Pet Insurance for your furry family members
- Wellness perks including onsite fitness equipment at both locations, EAP, and access to the Headspace App
- We invest in your future through Tuition Reimbursement
- Save on taxes with Flexible Spending Accounts
- Peace of mind with Life and AD&D Insurance
- Protect yourself with company paid Long-Term Disability and voluntary Short-Term Disability
Concora Credit provides equal employment opportunities to all Team Members and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Employment-based visa sponsorship is not available for this role. Concora Credit is an equal opportunity employer (EEO). Please see the Concora Credit Privacy Policy for more information on how Concora Credit processes your personal information during the recruitment process and, if applicable, based on your location, how you can exercise your privacy rights. If you have questions about this privacy notice or need to contact us in connection with your personal data, including any requests to exercise your legal rights referred to at the end of this notice, please contact .