Director of Infrastructure and Operations
Role details
Job location
Tech stack
Job description
The Director of Infrastructure & Operations (I&O) leads the strategy, delivery, and day-to-day operation of the technology. Accountable for reliable, secure, and cost-effective infrastructure and operational IT services across a large, distributed footprint as well as home office operations-covering networks/Wi Fi, identity and collaboration, endpoints, and IT service management. The Director treats internal IT infrastructure as a product (IaaP) with an internal-developer-platform mindset-prioritizing developer experience, usability, and self-service capabilities for software engineering stakeholders while maintaining strong security and reliability. The Director also partners closely with internal software teams that build and maintain custom solutions, ensuring strong operational readiness, release discipline, and measurable service outcomes that meet site-level SLAs.
What you'll get:
- Competitive salary and benefits.
- Work with a skilled, collaborative team that makes measurable contributions to organizational strategies and profitability.
- Mission-based company with values you trust.
- Company that has brand equity and fabulous reputation in the market.
What you'll do:
-
Team leadership: Build and develop high-performing teams across infrastructure, systems and service desk; define on-call models and escalation paths.
-
Large-scale, multi-site infrastructure leadership: Own strategy, lifecycle, and operations for connectivity (LAN/WAN, Wi Fi), compute/storage, endpoint and mobile device management, and collaboration platforms.
-
Standardization & repeatability: Define standard technology patterns (network designs, device builds, security baselines, and spares) and enable repeatable, automated deployments through documented templates, "golden paths," and self-service provisioning (where appropriate) so development teams can move faster with fewer handoffs and less variability.
-
Infrastructure as a Product (IaaP) / Internal Developer Platform: Treat core internal infrastructure capabilities (identity, networking, compute/runtime platforms, endpoint standards, monitoring/observability, secrets/certificates, and deployment standards) as product offerings for internal engineering teams. Deliver developer self-service through a clear service catalog and portal experience (e.g., request/provision workflows, standardized templates, paved-road "golden paths," and reusable modules), integrated with engineering toolchains where appropriate (e.g., CI/CD pipelines, ticketing/ITSM, and approvals). Establish published roadmaps; define SLAs/SLOs and platform KPIs (time-to-provision, change lead time enablement, developer satisfaction); and continuously improve usability, documentation, and support. Expand automation and infrastructure-as-code practices by standardizing version-controlled blueprints (e.g., Terraform modules), GitOps based change workflows, continuous delivery practices, automated environment creation, policy-as-code guardrails, automated testing/validation of infrastructure changes, and repeatable rollback patterns-reducing manual work while maintaining security and compliance requirements.
-
Client venue enablement: Partner with clients to ensure stable connectivity, device readiness, integrations, and support processes; manage escalation paths and operational coordination to protect service during peak meal periods.
-
Custom back-office systems (operational partnership): Partner with internal software teams to operationalize custom back-office applications (including Inventory and Menu Builder); ensure environments are monitored, deployments are repeatable, and incidents are triaged effectively.
-
Service management, field support & SLAs: Lead ITIL-aligned processes and support operating models (service desk, escalations) with a focus on fast restoration and meeting SLAs/OLAs.
-
Location onboarding & rollouts: Support new client launches, site transitions, and rollouts; ensure repeatability and scalability, readiness checklists, connectivity validation, device provisioning, and cutover support are executed reliably.
-
Availability, resiliency & DR: Define and test disaster recovery and continuity plans for critical services (network, identity, collaboration, custom field systems, and key integrations); establish recovery objectives and improve resilience.
-
Operational excellence & observability: Implement monitoring/alerting for networks, endpoints, and key platforms; standardize runbooks; and expand automation for patching, provisioning, and routine maintenance at scale. Operationalize IaC by implementing drift detection, controlled promotion of changes through environments, automated compliance checks, and (where appropriate) auto-remediation for common issues. Track platform telemetry that matters to engineering stakeholders (e.g., provisioning latency, change failure rate, and incident impact) to drive continuous improvement.
-
Security, privacy & compliance: Partner with Information Security to meet PCI DSS (as applicable), FERPA-aligned expectations at education sites, and corporate security requirements; drive vulnerability remediation, endpoint hardening, identity controls, and logging while minimizing operational disruption.
-
Release & change governance: Lead change/release governance for infrastructure and field systems; ensure change windows, communication, rollback plans, and site readiness are in place.
-
Vendor & contract management: Manage MSPs, carriers/ISPs, and hardware vendors; oversee SLAs/SOWs, dispatch processes, spares strategy, and performance reviews; coordinate with vendor partners on support outcomes.
-
Financial management: Own cost optimization; manage lifecycle refresh and standard configurations to reduce support cost and downtime.
Requirements
- Bachelor's degree in Information Technology, Computer Science, or related field (or equivalent practical experience).
- Minimum of ten years of progressive experience in IT infrastructure and operations, including a minimum of four years in people management/leadership roles.
- Demonstrated experience supporting large, distributed, customer-facing operations (100 to 300 locations preferred) with time-sensitive service windows and high availability expectations.
- Strong knowledge of core infrastructure domains (networking/Wi Fi, systems, identity, endpoint management, backup/DR, monitoring).
- Experience operating and supporting business-critical applications in production in partnership with software engineering teams.
- Experience with cloud platforms and operating models (e.g., Azure/AWS), including governance and cost management.
- Experience implementing and managing IT service management practices and tooling (e.g., ServiceNow or similar), including incident/problem/change disciplines and SLA performance.
- Proven ability to lead through major incidents with clear communications, effective vendor coordination, and structured root-cause analysis.
- Preferred:
- Experience in contract dining or supporting technology.
- Experience supporting site launches, transitions, acquisitions, or large-scale rollouts.
- Experience with modern operations practices for software (e.g., SRE/DevOps concepts, CI/CD change controls, observability) in partnership with engineering teams.
- Experience with PCI DSS programs, payment ecosystems, and supporting audits/assessments.
- Relevant certifications (e.g., ITIL, CISSP/CCSP, Azure/AWS certs, Microsoft 365, Cisco, VMware, PMP).
- Experience building internal developer platforms and self-service tooling (e.g., developer portals, service catalogs, golden paths/templates) and operating IaC/GitOps practices with policy-as-code guardrails.
- Experience with automation/infrastructure-as-code (e.g., Terraform, Ansible, scripting).
About SAGE