Cloud / DevOps Engineer, AI Compute Infrastructure

Richtech Robotics Inc.
Las Vegas, United States of America
9 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 120K

Job location

Las Vegas, United States of America

Tech stack

Private Networks
Artificial Intelligence
Cloud Computing
Nvidia CUDA
System Configuration
Linux
DevOps
DNS
General-Purpose Computing on Graphics Processing Units
Monitoring of Systems
Virtual Private Networks (VPN)
Python
Linux System Administration
Machine Learning
Uptime
Network Connections
Routing
Performance Tuning
TensorFlow
Prometheus
TCP/IP
AI Infrastructure
Jupyter Notebook
Private Cloud Environment
Data Logging
Scripting (Bash/Python/Go/Ruby)
Cloud Platform System
PyTorch
Grafana
Firewalls (Computer Science)
Jupyter
Containerization
HuggingFace
Slurm
Docker

Job description

Richtech Robotics is looking for a Cloud / DevOps Engineer to support our AI compute infrastructure services. This role will help deploy, manage, and support cloud-based GPU environments for customers building AI models, robotics applications, simulation workflows, and Physical AI systems. The ideal candidate has strong Linux, networking, cloud infrastructure, and DevOps experience, with a willingness to learn GPU computing, CUDA environments, and AI workload deployment.

Responsibilities

  • Deploy and manage cloud-based GPU compute environments for customer workloads.

  • Configure virtual networks, VPNs, firewalls, security groups, SSH access, storage, and user permissions.

  • Build and maintain Linux-based environments for AI development, including Docker containers, CUDA drivers, Python environments, and Jupyter notebooks.

  • Work with AI engineers to deploy required runtime environments for model training, fine-tuning, simulation, and inference.

  • Monitor GPU usage, system performance, uptime, storage, and network connectivity.

  • Troubleshoot customer issues related to access, environment setup, networking, storage, and compute availability.

  • Create reusable deployment scripts, images, templates, and technical documentation.

  • Coordinate with cloud infrastructure partners and internal teams to resolve technical issues.

Requirements

Do you have experience in Technical troubleshooting support?, * 2+ years of experience in cloud infrastructure, DevOps, systems administration, or network engineering.

  • Strong Linux administration skills.

  • Solid understanding of networking, including TCP/IP, VPN, DNS, firewalls, routing, security groups, and private networks.

  • Experience with Docker and containerized environments.

  • Experience with at least one major cloud platform or private cloud environment.

  • Familiarity with monitoring, logging, automation, and scripting.

  • Ability to troubleshoot infrastructure issues independently.

  • Strong communication skills and willingness to support customer-facing technical requests.

  • Interest in learning GPU computing, CUDA environments, and AI infrastructure.

Preferred Qualifications

  • Experience deploying NVIDIA GPU drivers, CUDA, cuDNN, or NVIDIA Container Toolkit.

  • Familiarity with PyTorch, TensorFlow, Hugging Face, Jupyter, or vLLM.

  • Experience with Slurm or distributed compute environments.

  • Experience with Prometheus, Grafana, ELK, or similar monitoring tools.

  • Prior experience supporting AI/ML, data science, robotics, or simulation workload

Benefits & conditions

Pulled from the full job description

  • Health insurance
  • Paid time off
  • Vision insurance
  • Dental insurance, * Dental insurance
  • Health insurance
  • Paid time off
  • Vision insurance

Apply for this position