AI Systems Administrator

Draper, Inc.
Cambridge, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 220K

Job location

Cambridge, United States of America

Tech stack

API
Artificial Intelligence
Computing Platforms
Audit Trail
Bash
Business Process Modeling
Unix
Ubuntu (Operating System)
Configuration Management
System Configuration
Linux
Python
Linux System Administration
Oracle Applications
Package Management Systems
Red Hat Enterprise Linux - RHEL
Cloud Services
Ansible
Prometheus
Runbook
Oracle Linux
Software Vulnerability Management
AI Infrastructure
Scripting (Bash/Python/Go/Ruby)
Large Language Models
Grafana
GIT
SC Clearance
Git Flow
Kubernetes
Information Technology
Network Server
GPT
Data Pipelines

Job description

The AI Systems Administrator is instrumental in bringing AI to Draper. The incumbent implements a closed GPT environment at Draper in which several different LLM models are maintained and used throughout the organization. This role works with engineering to ensure that multiple LLMs are accessible through a chat interface, API, and assistive tools for the general purpose of the organization. In addition, they will ensure the system health of the DraperGPT server to allow for additional AI infrastructure requiring large amounts of compute to be utilized without impacting the performance of other LLM resources. This will also include API interfaces with various software platforms across Draper (e.g., engineering, accounting, legal). This role helps Draper implement automation, streamline processes, and support mission-critical AI/ML workloads. Resource allocation is critical., It also involves traditional Linux admin duties (installing, configuring, securing servers, scripting, monitoring) but with a strong focus on supporting AI/ML (e.g., GPU servers, Kubernetes, data pipelines), managing AI. This job supports AI engineers using their knowledge to guide AI engineers with solutions and recommendations. The role is part of a team of Linux system administrators responsible for managing the functionality and efficiency of a group of computers, approximately 750, running primarily Oracle Linux. Additional operating system knowledge, e.g. Ubuntu and RHEL, maybe be necessary. Maintain the integrity and security of servers and systems. Serves as a front-line interface to end users and other IS teams. The Systems Administrator makes recommendations for hardware and software purchases. Interacts with vendors and VARs directly on proactive projects as well as reacting to support issues. Duties may include installation, configure, and maintain new hardware/software, troubleshooting, permissions and training other administrators. Requires a solid understanding of UNIX based operating systems.

This role will by hybrid (3 days/week) in Cambridge, MA and will require an Active Secret Clearance., * Build, operate, and troubleshoot RHEL/Oracle systems supporting GPU workloads (OS lifecycle, patching, performance, reliability).

  • Manage the GPU enablement layer: driver/toolkit lifecycle, kernel/driver compatibility, coordinated upgrades and rollback plans, and ongoing health monitoring.
  • Implement and maintain observability (metrics, logs, alerting) for system, GPU, and storage performance/health (e.g., Prometheus/Grafana and GPU telemetry such as DCGM/NVML or equivalent).
  • Couple above observability with LLM performance and usage, and identify and warn users over allocating resources.
  • Maintaining (ie resetting or rebuilding) LLM servers to ensure high up times and usage capabilities across organization.
  • Working with a team of engineers to allow for software upgrades (e.g. new models, or additional AI software) to the server while maintaining security needs.
  • Partner with storage/network peers to baseline throughput/latency, identify bottlenecks, and tune the platform for predictable performance.
  • Automation & scripting: create and maintain automation for platform administration and broader Linux team workflows (provisioning/config enforcement, patch orchestration, reporting, routine maintenance), using Git-based practices. (Python/Ansible)
  • Work to support various Linux, Cloud AWS/Azure projects
  • Lead projects including large scale migrations as well as platform redesign and implementation. Utilize resources within the Linux team as well as across the IS department to reach goals

Requirements

  • Strong production Linux administration experience (RHEL/Oracle preferred): systemd, networking, troubleshooting, performance analysis, patching, package management.
  • Strong automation skills: Bash and/or Python, plus Ansible (preferred) or equivalent configuration management; comfortable with CI/Git workflows.
  • Experience supporting enterprise platforms (incident response, root-cause analysis, postmortems, runbooks/documentation).
  • Security-minded operations in regulated environments; familiarity with CUI handling concepts and control expectations (audit logging, vulnerability remediation, change control)., * Bachelor's degree in Computer Science or a related field., * 3 years' experience in Linux system administration, supporting production systems and core utility services in a complex enterprise environment.

About the company

Draper is an independent, nonprofit research and development company headquartered in Cambridge, MA. The 2,000+ employees of Draper tackle important national challenges with a promise of delivering successful and usable solutions. From military defense and space exploration to biomedical engineering, lives often depend on the solutions we provide. Our multidisciplinary teams of engineers and scientists work in a collaborative environment that inspires the cross-fertilization of ideas necessary for true innovation. For more information about Draper, visit www.draper.com ., Our work is very important to us, but so is our life outside of work. Draper supports many programs to improve work-life balance including workplace flexibility, employee clubs ranging from photography to yoga, health and finance workshops, off site social events and discounts to local museums and cultural activities. If this specific job opportunity and the chance to work at a nationally renowned R&D innovation company appeals to you, apply now www.draper.com/careers .

Apply for this position