HPC AI Systems Administrator
Role details
Job location
Tech stack
Job description
This position will support government accounts. Therefore, due to federal export-control regulations, the selected candidate must hold U.S. citizenship, U.S. lawful permanent resident/Green Card status or otherwise have a category of refugee/asylee status enabling them to perform the role without requiring a license under the International Traffic in Arms Regulations (ITAR) or Export Administration Regulations (EAR)., The Data Center Administration team is seeking a Senior System Administrator to provide advanced system administration and lab operations support for hardware, network, and software environments used by HPE HPC & AI Performance Engineering teams. These environments support internal product development, performance engineering, ISV validation, and customer-facing sales and benchmarking activities. This role serves as a senior technical contributor and lab expert, providing design guidance, operational leadership, and escalation-level troubleshooting across complex HPC and AI lab environments. The position partners closely with engineering teams, infrastructure support groups, and external partners to ensure lab stability, availability, and effective use of resources. The Senior System Administrator contributes to continuous improvement of lab processes, policies, and standards, prioritizes lab requests, mentors junior staff, and supports future lab expansion and facility transitions., * Image, configure, and upgrade servers with Linux operating systems, including firmware updates and switch configuration to support lab environments.
- Configure and manage multiple root slots hosting varied operating system images in support of HPC cluster provisioning, validation, and testing workflows.
- Provide design guidance and operational support for virtualized lab infrastructure, including virtual server administration and the design of highly available, fault-tolerant environments.
- Provide design guidance for lab storage solutions, including installation, configuration, and performance management of high-performance storage systems (e.g., Lustre) to support sales, benchmarking, and partner activities.
- Provide guidance for hardware and software installation and configuration, including advanced hardware diagnostics and coordination with infrastructure support teams to resolve power, CPU, and GPU issues.
- Collaborate with AI benchmarking, R&D, and performance engineering teams to design and operate lab environments that meet internal, partner, and customer requirements.
- Design lab layouts, networks, and operational policies that meet functional needs while adhering to cybersecurity and asset protection standards.
- Prioritize and coordinate lab work activities to ensure timely delivery of high-impact requests and effective utilization of lab resources.
- Make recommendations on lab resource usage, capacity planning, and future expansion to support evolving business and engineering needs.
- Oversee and support lab transitions, including facility moves and infrastructure refresh activities.
- Install, configure, and support job scheduling and resource management tools to maximize lab utilization.
- Serve as a technical mentor to junior system administrators and lab staff, providing guidance on best practices, troubleshooting, and operational standards.
- Communicate lab successes, risks, failures, and issues to management in a timely and professional manner.
- Work effectively with remote administrators, vendors, and partners when specialized expertise or additional support is required., The Senior Financial Analyst will oversee partner services revenue, manage P&L, conduct financial analyses, and collaborate with leadership to optimize partner performance., The Sr. Director of Technical Program Management will lead cross-functional programs, drive strategy and execution of large-scale projects, and oversee a team of technical program managers at Capital One., Artificial Intelligence * Fintech * Information Technology * Logistics * Payments * Business Intelligence * Generative AI The Product Success Director drives product adoption and value realization, ensuring customer success through product expertise, collaboration, and data-driven insights. Top Skills: Procurement TechnologySaaS
What you need to know about the Colorado Tech Scene
With a business-friendly climate and research universities like CU Boulder and Colorado State, Colorado has made a name for itself as a startup ecosystem. The state boasts a skilled workforce and high quality of life thanks to its affordable housing, vibrant cultural scene and unparalleled opportunities for outdoor recreation. Colorado is also home to the National Renewable Energy Laboratory, helping cement its status as a hub for renewable energy innovation.
Key Facts About Colorado Tech
- Number of Tech Workers: 260,000; 8.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Lockheed Martin, Century Link, Comcast, BAE Systems, Level 3
- Key Industries: Software, artificial intelligence, aerospace, e-commerce, fintech, healthtech
- Funding Landscape: $4.9 billion in VC funding in 2024 (Pitchbook)
- Notable Investors: Access Venture Partners, Ridgeline Ventures, Techstars, Blackhorn Ventures
- Research Centers and Universities: Colorado School of Mines, University of Colorado Boulder, University of Denver, Colorado State University, Mesa Laboratory, Space Science Institute, National Center for Atmospheric Research, National Renewable Energy Laboratory, Gottlieb Institute
Requirements
- Communication - Communicates clearly and effectively in both written and verbal forms; collaborates well with diverse technical teams.
- Creativity / Innovation - Applies creative problem-solving approaches and contributes to continuous improvement of lab processes and capabilities.
- Customer Service - Demonstrates a service-oriented mindset when supporting internal teams, partners, and stakeholders.
- Job Knowledge - Maintains deep technical knowledge of Linux systems, lab operations, and HPC/AI infrastructure.
- Problem Solving / Analysis - Breaks down complex technical issues, identifies root causes, and develops effective solutions.
- Quality - Demonstrates attention to detail, accuracy, and reliability.
- Technical Skills - Strong expertise in Linux system administration with working knowledge of networking, storage, virtualization, and hardware platforms., * Bachelor's degree in Computer Science, MIS, or a related technical field required mainly System Administration.
- Minimum of 8-10 years of Linux system administration experience required, preferably in HPC, AI, or lab-based environments.
- Candidates with strong Linux or network administration backgrounds and demonstrated interest in advanced lab system administration will also be considered.
- This role works as part of a team of system administrators and lab staff and reports to the Data Center Administration Manager.
Benefits & conditions
The HPC AI Systems Administrator supports advanced system administration for HPC and AI environments, focusing on configuration, troubleshooting, and lab stability while mentoring junior staff and improving processes. The summary above was generated by AI HPC AI Systems Administrator
This role has been designed as ''Onsite' with an expectation that you will primarily work from an HPE office., "The expected salary/wage range for this position is provided below. Actual offer may vary from this range based upon geographic location, work experience, education/training, and/or skill level.
- United States of America: Annual Salary USD 111,000 - 211,000 in Colorado // 120,000 - 243,000 in California // 105,500 - 243,000 in Minnesota & Texas & Wisconsin The listed salary range reflects base salary. Variable incentives may also be offered."