Cloud / DevOps Engineer, AI Compute Infrastructure

Richtech Robotics Inc.

Las Vegas, United States of America

2 months ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Compensation

$ 120K

Job location

Las Vegas, United States of America

Tech stack

Private Networks

Artificial Intelligence

Cloud Computing

Nvidia CUDA

System Configuration

Linux

DevOps

DNS

General-Purpose Computing on Graphics Processing Units

Monitoring of Systems

Virtual Private Networks (VPN)

Python

Linux System Administration

Machine Learning

Uptime

Network Connections

Routing

Performance Tuning

TensorFlow

Prometheus

TCP/IP

AI Infrastructure

Jupyter Notebook

Private Cloud Environment

Data Logging

Scripting (Bash/Python/Go/Ruby)

Cloud Platform System

PyTorch

Grafana

Firewalls (Computer Science)

Jupyter

Containerization

HuggingFace

Slurm

Docker

Job description

Richtech Robotics is looking for a Cloud / DevOps Engineer to support our AI compute infrastructure services. This role will help deploy, manage, and support cloud-based GPU environments for customers building AI models, robotics applications, simulation workflows, and Physical AI systems. The ideal candidate has strong Linux, networking, cloud infrastructure, and DevOps experience, with a willingness to learn GPU computing, CUDA environments, and AI workload deployment.

Responsibilities

Deploy and manage cloud-based GPU compute environments for customer workloads.
Configure virtual networks, VPNs, firewalls, security groups, SSH access, storage, and user permissions.
Build and maintain Linux-based environments for AI development, including Docker containers, CUDA drivers, Python environments, and Jupyter notebooks.
Work with AI engineers to deploy required runtime environments for model training, fine-tuning, simulation, and inference.
Monitor GPU usage, system performance, uptime, storage, and network connectivity.
Troubleshoot customer issues related to access, environment setup, networking, storage, and compute availability.
Create reusable deployment scripts, images, templates, and technical documentation.
Coordinate with cloud infrastructure partners and internal teams to resolve technical issues.

Requirements

Do you have experience in Technical troubleshooting support?, * 2+ years of experience in cloud infrastructure, DevOps, systems administration, or network engineering.

Strong Linux administration skills.
Solid understanding of networking, including TCP/IP, VPN, DNS, firewalls, routing, security groups, and private networks.
Experience with Docker and containerized environments.
Experience with at least one major cloud platform or private cloud environment.
Familiarity with monitoring, logging, automation, and scripting.
Ability to troubleshoot infrastructure issues independently.
Strong communication skills and willingness to support customer-facing technical requests.
Interest in learning GPU computing, CUDA environments, and AI infrastructure.

Preferred Qualifications

Experience deploying NVIDIA GPU drivers, CUDA, cuDNN, or NVIDIA Container Toolkit.
Familiarity with PyTorch, TensorFlow, Hugging Face, Jupyter, or vLLM.
Experience with Slurm or distributed compute environments.
Experience with Prometheus, Grafana, ELK, or similar monitoring tools.
Prior experience supporting AI/ML, data science, robotics, or simulation workload

Benefits & conditions

Pulled from the full job description

Health insurance
Paid time off
Vision insurance
Dental insurance, * Dental insurance
Health insurance
Paid time off
Vision insurance

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

Apply for this position

Good distractions

Moments

Videos View all