Senior Linux HPC Systems Administrator/Engineer

Cognizant
Charing Cross, United Kingdom
9 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charing Cross, United Kingdom

Tech stack

Apple Mac Systems
Cluster Analysis
Linux
Web Servers
Network Troubleshooting
Linux System Administration
Red Hat Enterprise Linux - RHEL
Scientific Computating
Software Engineering
Graphics Processing Unit (GPU)
Information Technology
Performance Monitor
Slurm
Hardware Infrastructure
Servicenow

Job description

Enterprise Linux Administration:

  • Administer, configure, and maintain RHEL environments (specifically RHEL 8 & 9) ensuring stability, performance, and security.
  • Provide hands-on support with high-end workstation hardware for scientists, promptly addressing hardware and software issues.

Scientific and HPC Support:

  • Offer technical support to scientific users, bridging the gap between research demands and IT infrastructure.
  • Leverage any scientific computing experience to optimize system performance and manage specialized applications.
  • Assist with management of high-performance compute resources, including experience with Slurm, clustering, and related HPC technologies.

Collaboration and Stakeholder Management:

  • Work closely with other technical teams and stakeholders to align IT services with organizational needs.
  • Build and maintain strong stakeholder relationships, communicating complex technical concepts.
  • Provide in-person support onsite to ensure effective resolution of issues and a high level of customer satisfaction.

Service Management and Process Improvement:

  • Utilize ServiceNow for tracking incidents, managing change requests, and ensuring timely resolution of service tickets.
  • Implement and follow IT best practices for incident management, performance monitoring, and network troubleshooting.

Additional Technical Duties:

  • Manage SSL certificates and configure web servers as needed.
  • Monitor and troubleshoot system performance issues, including understanding the impact of GPUs, networking, and other hardware components.
  • Handle vendor relationships effectively, coordinating with external partners to resolve issues and optimize service delivery.
  • Maintain familiarity with MacOS systems to provide assistance when necessary.

Requirements

Experienced Senior Linux HPC Systems Administrator/Engineer with enterprise IT experience to manage and support our critical Linux-based infrastructure

This role is critical for managing and supporting our advanced computing environments, which are pivotal to scientific research and high-performance computing (HPC) initiatives.

The position requires hands-on expertise with high-end workstation hardware and scientific applications, as well as a strong background in HPC techniques, including clustering and workload management with tools like Slurm.

The ideal candidate will be proficient in RedHat Enterprise Linux (RHEL 8 & 9) and have experience with scientific and high-performance computing environments, and will also have excellent stakeholder relationship skills and the ability to communicate complex technical concepts effectively to various stakeholders, ensuring our scientists receive top-tier in-person support onsite., * Enterprise IT experience with extensive hands-on expertise in RedHat Enterprise Linux (RHEL), specifically RHEL 8 & 9.

  • Proven experience with high-end workstation hardware setups and scientific application support.
  • Demonstrated knowledge of scientific computing and experience in high performance compute environments, including experience with Slurm and clustering, is highly desirable.
  • Strong troubleshooting skills for both hardware and software issues.

Interpersonal Skills:

  • Excellent communication skills with a proven ability to engage and build relationships with stakeholders at various levels.
  • Experience working collaboratively with other technical teams to resolve complex problems and drive operational improvements.
  • Strong stakeholder relationship building skills and the ability to manage vendor relationships effectively.

Additional Desirable Skills:

  • Working knowledge of ServiceNow and its application in incident and service management.
  • Familiarity with networking concepts, performance monitoring tools, and GPU technologies.
  • Any experience with scientific applications will be a significant advantage.
  • Exposure to MacOS environments is useful but not essential.

Onsite Requirement:

  • Must be able to work onsite to provide in-person technical support to scientists and ensure optimal system performance.

Apply for this position