Linux Systems Engineer

Advance Digital Systems
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Tech stack

Business Analytics Applications
Computing Platforms
Ubuntu (Operating System)
CentOS
Configuration Management
Linux
Distributed Systems
R
Job Scheduling
Python
Linux System Administration
Matlab
Performance Tuning
Red Hat Enterprise Linux - RHEL
Ansible
SAS (Software)
Shell Script
Stata
Software Vulnerability Management
High Performance Computing
System Availability
Data Management
Slurm
Vulnerability Analysis

Job description

We are seeking an experienced Senior Linux Systems Engineer will support and enhance a Linux-based High-Performance Computing (HPC) environment that underpins advanced statistical modeling and economic research across multiple business units. This platform enables data scientists, economists, and analysts to perform large-scale computations using modern analytical tools. The role focuses on ensuring platform stability, scalability, performance, and security while continuously evolving the environment to meet growing analytical demands.

Position Responsibilities

· Administer, maintain, and optimize Linux-based HPC infrastructure to ensure high availability, performance, and reliability.

· Perform system patching, upgrades, configuration management, and security hardening in alignment with enterprise standards.

· Monitor system health, troubleshoot complex issues, and implement performance tuning across compute, storage, and network resources.

· Provide Tier 3 support for the analytics platform, resolving advanced technical issues and minimizing operational disruptions.

· Design, implement, and manage automation solutions using Ansible and Ansible Automation Platform to streamline system operations.

· Support HPC workload management frameworks (e.g., SLURM, Open OnDemand) and ensure efficient job scheduling and resource utilization.

· Collaborate with data scientists, economists, and business stakeholders to translate analytical requirements into scalable technical solutions.

· Implement and maintain security controls, conduct vulnerability assessments, and ensure compliance with regulatory and organizational standards.

· Contribute to platform architecture, capacity planning, and continuous improvement initiatives, including system enhancements and new feature deployments.

· Develop and maintain technical documentation, operational procedures, and knowledge base artifacts.

· Participate in an on-call rotation to support critical systems and ensure uninterrupted platform operations.

Requirements

· Strong expertise in Linux system administration (e.g., Red Hat, CentOS, or Ubuntu) and shell scripting.

· Hands-on experience with Ansible and Ansible Automation Platform for configuration management and automation.

· Proven experience supporting High-Performance Computing (HPC) environments, including workload schedulers such as SLURM and user access tools like Open OnDemand.

· Familiarity with statistical and analytical tools such as R, Python, MATLAB, Stata, and SAS within HPC environments.

· Solid understanding of system performance tuning, capacity planning, and troubleshooting in distributed computing environments.

· Experience implementing security best practices, system hardening, and vulnerability management in regulated environments.

· Strong analytical, problem-solving, and troubleshooting skills with a proactive, customer-focused mindset.

· Excellent communication skills with the ability to collaborate effectively across technical and non-technical teams.

Apply for this position