Server Repair Engineering Supervisor

Ultimate Staffing Services
Grapevine, United States of America
14 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 65K

Job location

Grapevine, United States of America

Tech stack

Artificial Intelligence
Automation of Tests
Intelligent Platform Management Interface
Linux
Microprocessors
Python
LabView
Matlab
Windows Server
Dell PowerEdge
TensorFlow
Server Administration
Software Systems
Scripting (Bash/Python/Go/Ruby)
Graphics Processing Unit (GPU)
High Performance Computing
PyTorch
Deep Learning
Information Technology
Hardware Infrastructure
Network Server
Server Operating Systems & Platforms

Job description

The Server Repair Engineering Supervisor is responsible for leading both the technical validation of AI server hardware/software systems and the optimization of operational workflows within a high-performance AI Server Service Center. This dual-role position oversees test engineering operations, drives process improvement initiatives, and ensures quality, reliability, and efficiency across service center activities. The role requires strong technical expertise, process-driven leadership, and hands-on experience with AI server technology., * Lead a team of test engineers performing diagnostics, validation, and troubleshooting on AI server hardware and software.

  • Establish, implement, and monitor test procedures to ensure compliance with Dell quality standards and internal requirements.
  • Evaluate AI servers for performance, reliability, and functionality using advanced diagnostic tools and methodologies.
  • Develop, refine, and automate testing scripts and procedures for server systems and components.
  • Collaborate closely with Product Development, Quality Assurance, and Engineering teams to identify issues and drive resolution during testing and validation stages., * Analyze, design, and improve operational workflows for testing, repair, refurbishment, and upgrades of AI servers.
  • Lead initiatives that enhance throughput, quality, and cost efficiency across service center operations.
  • Conduct root cause analysis for process-related failures and establish robust corrective and preventive action plans.
  • Continuously research industry best practices to ensure alignment with modern process optimization and manufacturing engineering standards.
  • Perform capacity planning to support scalable testing and service operations.

Leadership & Operational Management

  • Supervise and mentor the TE and PE teams, providing guidance, coaching, and technical expertise.
  • Allocate resources, establish project priorities, and ensure timely completion of testing and process-related deliverables.
  • Maintain compliance with quality, environmental, and safety standards (ISO, internal AI standards, regulatory guidelines).
  • Communicate operational updates, challenges, risks, and improvement plans to leadership and cross-functional partners.
  • Serve as the point of escalation for complex technical or operational issues within the service center.

Requirements

Do you have experience in Team management?, Do you have a Master's degree?, * Bachelor's degree in Electrical Engineering, Industrial Engineering, Computer Science, or related field required; Master's degree preferred.

  • 8-10 years of relevant experience in test engineering, process engineering, or hardware/software system operations.
  • Minimum 3 years of supervisory or technical leadership experience.
  • Strong preference for experience with AI servers, high-performance computing systems, or advanced enterprise server environments.

Preferred Certifications

  • EMC Proven Professional or comparable server/hardware certifications.
  • Six Sigma Green Belt or Black Belt certification (process optimization).
  • Certifications related to AI/ML hardware or data workflows (e.g., Deep Learning Institute credentials).

Essential Skills

  • Expertise in server diagnostics and troubleshooting for CPUs, GPUs, memory, storage, power supplies, and other critical components.
  • Strong working knowledge of AI server platforms (e.g., Dell PowerEdge, NVIDIA DGX) and related AI/ML frameworks such as TensorFlow or PyTorch.
  • Proficiency with process optimization methodologies (Six Sigma, Lean, Kaizen).
  • Experience with test automation tools and scripting languages (Python, MATLAB, LabVIEW, etc.).
  • Familiarity with server management platforms such as iDRAC and IPMI, and operating systems including Linux and Windows Server.
  • Ability to support high-performance computing environments and advanced AI server technologies.
  • Strong analytical, problem-solving, communication, and continuous improvement skills.

About the company

Ultimate Staffing is seeking a Server Repair Engineering Supervisor to join a client in Grapevine, TX. This is a full-time, direct hire position. The role is 100% onsite.

Apply for this position