Cloud / DevOps Engineer, AI Compute Infrastructure
Role details
Job location
Tech stack
Job description
Richtech Robotics is looking for a Cloud / DevOps Engineer to support our AI compute infrastructure services. This role will help deploy, manage, and support cloud-based GPU environments for customers building AI models, robotics applications, simulation workflows, and Physical AI systems. The ideal candidate has strong Linux, networking, cloud infrastructure, and DevOps experience, with a willingness to learn GPU computing, CUDA environments, and AI workload deployment.
Responsibilities
-
Deploy and manage cloud-based GPU compute environments for customer workloads.
-
Configure virtual networks, VPNs, firewalls, security groups, SSH access, storage, and user permissions.
-
Build and maintain Linux-based environments for AI development, including Docker containers, CUDA drivers, Python environments, and Jupyter notebooks.
-
Work with AI engineers to deploy required runtime environments for model training, fine-tuning, simulation, and inference.
-
Monitor GPU usage, system performance, uptime, storage, and network connectivity.
-
Troubleshoot customer issues related to access, environment setup, networking, storage, and compute availability.
-
Create reusable deployment scripts, images, templates, and technical documentation.
-
Coordinate with cloud infrastructure partners and internal teams to resolve technical issues.
Requirements
Do you have experience in Technical troubleshooting support?, * 2+ years of experience in cloud infrastructure, DevOps, systems administration, or network engineering.
-
Strong Linux administration skills.
-
Solid understanding of networking, including TCP/IP, VPN, DNS, firewalls, routing, security groups, and private networks.
-
Experience with Docker and containerized environments.
-
Experience with at least one major cloud platform or private cloud environment.
-
Familiarity with monitoring, logging, automation, and scripting.
-
Ability to troubleshoot infrastructure issues independently.
-
Strong communication skills and willingness to support customer-facing technical requests.
-
Interest in learning GPU computing, CUDA environments, and AI infrastructure.
Preferred Qualifications
-
Experience deploying NVIDIA GPU drivers, CUDA, cuDNN, or NVIDIA Container Toolkit.
-
Familiarity with PyTorch, TensorFlow, Hugging Face, Jupyter, or vLLM.
-
Experience with Slurm or distributed compute environments.
-
Experience with Prometheus, Grafana, ELK, or similar monitoring tools.
-
Prior experience supporting AI/ML, data science, robotics, or simulation workload
Benefits & conditions
Pulled from the full job description
- Health insurance
- Paid time off
- Vision insurance
- Dental insurance, * Dental insurance
- Health insurance
- Paid time off
- Vision insurance