Solution Architect - AI Factory

Nvidia

14 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Tech stack

Artificial Intelligence

Bash

C++

Cloud Computing

Nvidia CUDA

Data Centers

Software Debugging

Linux

Ethernet

InfiniBand

Python

Open Source Technology

Ansible

Enterprise Software Applications

Kubernetes

Information Technology

Low Latency

Slurm

Docker

Job description

Want to be part of a team that's revolutionizing the field of AI with data center scale solutions? We are looking for a hardworking Solution Architect with experience in designing, building, and maintaining large scale HPC and AI hybrid computing solutions to join our team at NVIDIA. As Solution Architects on the strategic Enterprise AI Factory team, we are actively helping NVIDIA AI Factory solutions bring the benefits of large scale AI to leading enterprise customers. We work closely with customers and partners to address unsolved problems in the industry and help to deploy and operationalize AI solutions at scale., Our day-to-day work involves guiding customers in their adoption of NVIDIA's compute, networking, and software stacks to deliver end-to-end GenAI and Agentic AI solutions. Don't think this is a high-level slideshow job - we are the voice of experience, using cloud native methodologies, low latency networks, and accelerated compute to help build modern AI factories. We also excel at sharing knowledge with others, whether it's delivering demos, assisting with proof-of-concepts, or writing papers and developer blogs. By collaborating with executives and engineering, we solve complex problems and help bring NVIDIA's premiere technologies to life in the cloud and in the datacenter. Our mission is to solve the problems that nobody else has solved yet, and we need someone to be an instrumental part of that!

Requirements

MS, or PhD in Engineering, Computer Science, or a related field (or equivalent experience).
Established track record working with AI and HPC clusters, both on-premises and cloud based.
4 plus years of proven experience with cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.
Hands-on experience with Datacenter MEP, network, storage, cluster configuration and debugging.
Strong analytical and problem-solving skills, along with an ability to articulate what you know to others.
Ability to multitask efficiently in a dynamic environment.

Ways to stand out from the crowd:

Strong coding and debugging skills, including experience with CUDA, Python, C/C++, Bash, AI frameworks and Linux utilities.
Demonstrated expertise through projects or Open Source contributions involving GPU workloads, Kubernetes, InfiniBand, Ethernet, or other areas related to high-performance clusters and hybrid cloud solutions.
Exhibit hands on experience with NVIDIA Enterprise software products, Base Command Manager, Run:ai and NVIDIA NIMs.
Willingness and ability to learn quickly and solve advanced problems.

About the company

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!