Enterprise Infrastructure Resiliency Engineer

Circle K Stores Inc.

Tempe, United States of America

13 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English, Spanish

Experience level

Senior

Job location

Tempe, United States of America

Tech stack

Microsoft Windows

Amazon Web Services (AWS)

Azure

Cloud Computing

Data Centers

Disaster Recovery

Monitoring of Systems

Hyper-V

Python

Linux System Administration

Linux Servers

Windows Server

Networking Basics

Powershell

Ansible

Virtualization Technology

vSphere

Google Cloud Platform

System Availability

Software Troubleshooting

Information Technology

Performance Monitor

Nutanix

Network Server

VMware

Job description

We are seeking a highly skilled and hands-on Disaster Recovery Engineer to support and enhance enterprise resiliency, business continuity, and infrastructure recovery capabilities across a large-scale global environment. This role is responsible for coordinating and executing disaster recovery planning, testing, and recovery operations while also serving as a technical contributor across server, infrastructure, cloud, and platform engineering functions.

The ideal candidate will possess a strong blend of operational coordination, infrastructure engineering expertise, and technical troubleshooting capabilities. This individual must be comfortable working across cross-functional IT teams, participating in infrastructure modernization initiatives, and supporting both on-premises and cloud-based environments., Disaster Recovery & Business Continuity

Develop, maintain, and continuously improve enterprise disaster recovery plans, runbooks, and recovery procedures.
Coordinate and lead disaster recovery testing activities, including tabletop exercises, failover testing, and full recovery simulations.
Validate backup integrity, recovery point objectives (RPO), and recovery time objectives (RTO).
Partner with application, infrastructure, security, networking, and business teams to ensure recovery readiness.
Identify gaps, risks, and dependencies within infrastructure and application recovery processes.
Maintain documentation related to DR architecture, recovery workflows, and operational standards.
Participate in incident response and major outage coordination efforts when required.
Assist with audit, compliance, and governance activities related to business continuity and disaster recovery.

Infrastructure Engineering & Administration

Perform administration and engineering support for enterprise infrastructure environments, including servers, virtualization platforms, storage, cloud, and data center technologies.
Support Windows and/or Linux server environments including provisioning, patching, performance monitoring, and troubleshooting.
Assist with infrastructure modernization, automation, and resiliency initiatives.
Support virtualization technologies such as VMware, Hyper-V, or Nutanix environments.
Participate in infrastructure lifecycle management, capacity planning, and operational support activities.
Troubleshoot infrastructure performance, replication, backup, and connectivity issues.
Collaborate with networking, security, cloud, and operations teams to resolve complex technical problems.
Support infrastructure monitoring, alerting, and operational reporting platforms.

Requirements

Do you have experience in Technical documentation?, * 5+ years of experience in Disaster Recovery, Infrastructure and Systems Administration, or related IT disciplines.

Strong understanding of disaster recovery methodologies, high availability architectures, and business continuity planning.
Experience supporting enterprise server infrastructure in large-scale environments.
Hands-on experience with Windows Server and/or Linux administration.
Experience with virtualization platforms such as Nutanix, VMware vSphere, or Hyper-V.
Deep understanding of backup and replication technologies.
Familiarity with cloud platforms such as AWS, Azure, or Google Cloud.
Knowledge of infrastructure monitoring and operational support processes.
Strong troubleshooting and root cause analysis skills.
Excellent communication, coordination, and documentation abilities., * Experience supporting retail, distributed enterprise, or large multi-site environments that have Point-of-Sale endpoints.
Experience with automation and scripting technologies such as PowerShell, Python, or Ansible.
Familiarity with ITIL operational practices and change management processes.
Experience with data center operations and infrastructure resiliency design.
Knowledge of storage platforms, networking fundamentals, and cybersecurity best practices.
Relevant certifications such as:

VMware VCP
Microsoft Certified
AWS or Azure certifications
CBCP, ISO 22301, or DR-related certifications

Key Competencies

Strong operational ownership and accountability
Ability to remain calm and organized during critical incidents
Cross-functional collaboration and leadership
Process-oriented mindset with attention to detail
Ability to balance strategic resiliency planning with hands-on technical execution

Work Environment

On-site role depending on business needs
Participation in after-hours recovery testing and major incident support may be required
Occasional travel may be required for data center, office, or recovery site support, In English

In Spanish

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all