IT Service Operations Manager
Role details
Job location
Tech stack
Job description
Baldor is seeking an IT Service Operations Manager to join our Technology Team, reporting to the Director of Infrastructure & Cybersecurity. Baldor's Technology organization delivers reliable, scalable, and secure technology services that power our distribution centers, corporate offices, and customer-facing platforms.
This role leads our 24/7 IT Service Desk while also owning the technology incident management program and our department-wide reliability and IT resiliency practices. The IT Service Operations Manager delivers responsive support to end users across our Bronx headquarters and branch distribution centers, manages production incidents with urgency, and drives an operational excellence agenda that develops the team, reduces incident frequency, and improves observability across our services.
This is a leadership-focused role grounded in deep technical experience. The manager is expected to roll up their sleeves, with no task too small, to guide complex incidents and build new operational capabilities. They partner with Infrastructure, Security, Product, and business stakeholders to translate operational needs into standard operating procedures (SOPs) that drive day-to-day execution, proactively prevent incidents, and elevate the end user experience., * Lead 24/7 IT Service Desk operations across three shifts, ensuring tickets meet defined SLAs across our distribution centers
- Coach and develop Service Desk supervisors and technicians, fostering a culture of urgency, accountability, and continuous improvement
- Own the technology incident management program, including the on-call schedule, major incident response, executive communications, post-incident reviews, and corrective actions to prevent recurrence
- Establish reliability targets for critical systems, mature monitoring and alerting, and formalize on-call paging and escalation
- Leverage AI and automation to drive efficiencies across the team; Baldor actively encourages the use of AI-augmented tooling to reduce repetitive work, accelerate response, and improve uptime, with latitude to introduce new capabilities and build an automation-first mindset
- Partner with Infrastructure, Security, and Product teams to design for resiliency, and maintain business continuity procedures, failover testing, and recovery runbooks
- Own endpoint management and the device lifecycle, including Intune-based MDM for Windows, macOS, iOS, and Android; imaging, patching, and compliance state; hardware refresh, asset inventory, and procurement coordination
- Own identity and access management (IAM) operations for end users, including onboarding and offboarding, role and access changes, provisioning standards, and account lifecycle hygiene
- Configure and continually improve our Zendesk and Jira Work Management platforms to streamline ticketing workflows, surface meaningful reporting, and deliver a better end user experience
- Manage VAR and vendor relationships with ownership of SLAs, performance reviews, and quality outcomes, * Service Desk meets or exceeds SLA, response time, and customer satisfaction targets across all shifts.
- Major incidents are managed with clear communication, fast resolution, and rigorous follow-through on corrective actions.
- Reliability targets, mature monitoring and alerting, and a formal on-call/paging program are in place, with a measurable downward trend in incident frequency.
- IT resiliency plans are documented, tested, and trusted by the business.
- Endpoint compliance, asset hygiene, and IAM operations are auditable, accurate, and meet our security and compliance standards.
- The Service Desk team is engaged, developing in their careers, and delivering an excellent end user experience.
WORK ENVIRONMENT & SCHEDULE
- Primarily on-site at our Bronx headquarters (155 Food Center Drive), with occasional travel to branch distribution centers as needed
- Participates in the on-call rotation as the senior escalation point for major incidents
- Ability to work additional hours, weekends, or holidays as needed for major incidents and system implementations
Requirements
- Bachelor's degree (Computer Science, MIS, Engineering, or related) preferred, or equivalent work experience
- 5+ years in IT operations, Service Desk, or infrastructure roles, including direct people management; a hands-on leader with a technical background who operates alongside the team
- Previous experience owning or contributing to an incident management program (major incident response, root cause analysis, and post-mortems)
- Working knowledge of reliability principles (service-level targets, observability, and automation)
- Experience with ITSM platforms (Zendesk, Jira, or equivalent), monitoring/observability tooling (Azure Monitor, Prometheus, Grafana, or equivalent), and endpoint management (Microsoft Intune or equivalent MDM)
- Strong analytical and problem-solving skills under pressure; excellent written and verbal communication, with experience presenting operational performance to leadership and partnering across technical and non-technical stakeholders, * Experience building or maturing a reliability or operational excellence practice, including standing up on-call paging platforms (PagerDuty, Opsgenie, or equivalent)
- Hands-on experience with Microsoft Azure (Azure Monitor, Log Analytics, Action Groups) and automation/scripting (PowerShell, Python, or equivalent)
- Experience supporting distributed operations across multiple sites (warehouse, logistics, or food distribution environments) and familiarity with IT compliance and security frameworks