Enterprise Infrastructure Resiliency Engineer

Circle K Stores Inc.
Tempe, United States of America
13 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English, Spanish
Experience level
Senior

Job location

Tempe, United States of America

Tech stack

Microsoft Windows
Amazon Web Services (AWS)
Azure
Cloud Computing
Data Centers
Disaster Recovery
Monitoring of Systems
Hyper-V
Python
Linux System Administration
Linux Servers
Windows Server
Networking Basics
Powershell
Ansible
Virtualization Technology
vSphere
Google Cloud Platform
System Availability
Software Troubleshooting
Information Technology
Performance Monitor
Nutanix
Network Server
VMware

Job description

We are seeking a highly skilled and hands-on Disaster Recovery Engineer to support and enhance enterprise resiliency, business continuity, and infrastructure recovery capabilities across a large-scale global environment. This role is responsible for coordinating and executing disaster recovery planning, testing, and recovery operations while also serving as a technical contributor across server, infrastructure, cloud, and platform engineering functions.

The ideal candidate will possess a strong blend of operational coordination, infrastructure engineering expertise, and technical troubleshooting capabilities. This individual must be comfortable working across cross-functional IT teams, participating in infrastructure modernization initiatives, and supporting both on-premises and cloud-based environments., Disaster Recovery & Business Continuity

  • Develop, maintain, and continuously improve enterprise disaster recovery plans, runbooks, and recovery procedures.
  • Coordinate and lead disaster recovery testing activities, including tabletop exercises, failover testing, and full recovery simulations.
  • Validate backup integrity, recovery point objectives (RPO), and recovery time objectives (RTO).
  • Partner with application, infrastructure, security, networking, and business teams to ensure recovery readiness.
  • Identify gaps, risks, and dependencies within infrastructure and application recovery processes.
  • Maintain documentation related to DR architecture, recovery workflows, and operational standards.
  • Participate in incident response and major outage coordination efforts when required.
  • Assist with audit, compliance, and governance activities related to business continuity and disaster recovery.

Infrastructure Engineering & Administration

  • Perform administration and engineering support for enterprise infrastructure environments, including servers, virtualization platforms, storage, cloud, and data center technologies.
  • Support Windows and/or Linux server environments including provisioning, patching, performance monitoring, and troubleshooting.
  • Assist with infrastructure modernization, automation, and resiliency initiatives.
  • Support virtualization technologies such as VMware, Hyper-V, or Nutanix environments.
  • Participate in infrastructure lifecycle management, capacity planning, and operational support activities.
  • Troubleshoot infrastructure performance, replication, backup, and connectivity issues.
  • Collaborate with networking, security, cloud, and operations teams to resolve complex technical problems.
  • Support infrastructure monitoring, alerting, and operational reporting platforms.

Requirements

Do you have experience in Technical documentation?, * 5+ years of experience in Disaster Recovery, Infrastructure and Systems Administration, or related IT disciplines.

  • Strong understanding of disaster recovery methodologies, high availability architectures, and business continuity planning.
  • Experience supporting enterprise server infrastructure in large-scale environments.
  • Hands-on experience with Windows Server and/or Linux administration.
  • Experience with virtualization platforms such as Nutanix, VMware vSphere, or Hyper-V.
  • Deep understanding of backup and replication technologies.
  • Familiarity with cloud platforms such as AWS, Azure, or Google Cloud.
  • Knowledge of infrastructure monitoring and operational support processes.
  • Strong troubleshooting and root cause analysis skills.
  • Excellent communication, coordination, and documentation abilities., * Experience supporting retail, distributed enterprise, or large multi-site environments that have Point-of-Sale endpoints.
  • Experience with automation and scripting technologies such as PowerShell, Python, or Ansible.
  • Familiarity with ITIL operational practices and change management processes.
  • Experience with data center operations and infrastructure resiliency design.
  • Knowledge of storage platforms, networking fundamentals, and cybersecurity best practices.
  • Relevant certifications such as:
  • VMware VCP
  • Microsoft Certified
  • AWS or Azure certifications
  • CBCP, ISO 22301, or DR-related certifications

Key Competencies

  • Strong operational ownership and accountability
  • Ability to remain calm and organized during critical incidents
  • Cross-functional collaboration and leadership
  • Process-oriented mindset with attention to detail
  • Ability to balance strategic resiliency planning with hands-on technical execution

Work Environment

  • On-site role depending on business needs
  • Participation in after-hours recovery testing and major incident support may be required
  • Occasional travel may be required for data center, office, or recovery site support, In English

In Spanish

Apply for this position