Senior SRE - INTL MX
Role details
Job location
Tech stack
Job description
An Insight Global Fortune 500 client is seeking an experienced Site Reliability Engineer (SRE) to support enterprise-scale systems deployed across Google Cloud Platform (GCP) and on-premise/in-store environments. This role is not focused on application development or coding, but instead centers on deployment support, observability, reliability, and operational excellence. The SRE will be embedded within the development lifecycle, partnering closely with engineering teams to ensure systems are resilient, reliable, and production-ready.
The ideal candidate is highly self-sufficient, leverages AI tools to accelerate troubleshooting and operational decision-making, and brings strong enterprise experience supporting complex, distributed environments.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Requirements
- 4-6 years of experience working as a Site Reliability Engineer
Hands-on experience supporting deployments to:
? Google Cloud Platform (GCP)
? On-premise or in-store server environments
- No application coding responsibilities
? Primary focus is on deployment support, configuration, validation, and building dashboards
-
Proven ability to:
-
Validate and test deployments to ensure production readiness
-
Confirm changes meet reliability and resiliency standards before release
-
Deep knowledge of:
-
Observability, telemetry, and monitoring
-
Resiliency, reliability, and system health validation
-
Experience with incident management, including detection, response, and resolution
-
Ability to assess and verify that infrastructure and deployment changes are stable and reliable
-
Comfortable being embedded within the development lifecycle, collaborating with engineering teams from pre-deployment through post-release
-
Demonstrated ability to leverage AI tools to solve traditional SRE/operational problems independently (high level of self-sufficiency)
Experience operating in enterprise-scale environments with complex systems and multiple stakeholders - Retail industry experience
Leveraging AI in the SRE cycle