Solution Architect - AI Factory

Nvidia
14 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote

Tech stack

Artificial Intelligence
Bash
C++
Cloud Computing
Nvidia CUDA
Data Centers
Software Debugging
Linux
Ethernet
InfiniBand
Python
Open Source Technology
Ansible
Enterprise Software Applications
Kubernetes
Information Technology
Low Latency
Slurm
Docker

Job description

Want to be part of a team that's revolutionizing the field of AI with data center scale solutions? We are looking for a hardworking Solution Architect with experience in designing, building, and maintaining large scale HPC and AI hybrid computing solutions to join our team at NVIDIA. As Solution Architects on the strategic Enterprise AI Factory team, we are actively helping NVIDIA AI Factory solutions bring the benefits of large scale AI to leading enterprise customers. We work closely with customers and partners to address unsolved problems in the industry and help to deploy and operationalize AI solutions at scale., Our day-to-day work involves guiding customers in their adoption of NVIDIA's compute, networking, and software stacks to deliver end-to-end GenAI and Agentic AI solutions. Don't think this is a high-level slideshow job - we are the voice of experience, using cloud native methodologies, low latency networks, and accelerated compute to help build modern AI factories. We also excel at sharing knowledge with others, whether it's delivering demos, assisting with proof-of-concepts, or writing papers and developer blogs. By collaborating with executives and engineering, we solve complex problems and help bring NVIDIA's premiere technologies to life in the cloud and in the datacenter. Our mission is to solve the problems that nobody else has solved yet, and we need someone to be an instrumental part of that!

Requirements

  • MS, or PhD in Engineering, Computer Science, or a related field (or equivalent experience).
  • Established track record working with AI and HPC clusters, both on-premises and cloud based.
  • 4 plus years of proven experience with cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.
  • Hands-on experience with Datacenter MEP, network, storage, cluster configuration and debugging.
  • Strong analytical and problem-solving skills, along with an ability to articulate what you know to others.
  • Ability to multitask efficiently in a dynamic environment.

Ways to stand out from the crowd:

  • Strong coding and debugging skills, including experience with CUDA, Python, C/C++, Bash, AI frameworks and Linux utilities.
  • Demonstrated expertise through projects or Open Source contributions involving GPU workloads, Kubernetes, InfiniBand, Ethernet, or other areas related to high-performance clusters and hybrid cloud solutions.
  • Exhibit hands on experience with NVIDIA Enterprise software products, Base Command Manager, Run:ai and NVIDIA NIMs.
  • Willingness and ability to learn quickly and solve advanced problems.

About the company

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!

Apply for this position