Adolf Hohl
Efficient deployment and inference of GPU-accelerated LLMs
#1about 2 minutes
The evolution of generative AI from experimentation to production
Generative AI has rapidly moved from experimentation with models like Llama and Mistral to production-ready applications in 2024.
#2about 3 minutes
Comparing managed AI services with the DIY approach
Managed services offer ease of use but limited control, while a do-it-yourself approach provides full control but introduces significant complexity.
#3about 4 minutes
Introducing NVIDIA NIM for simplified LLM deployment
NVIDIA Inference Microservices (NIM) provide a containerized, OpenAI-compatible solution for deploying models anywhere with enterprise support.
#4about 2 minutes
Boosting inference throughput with lower precision quantization
Using lower precision formats like FP8 dramatically increases model inference throughput, providing more performance for the same hardware investment.
#5about 2 minutes
Overview of the NVIDIA AI Enterprise software platform
The NVIDIA AI Enterprise platform is a cloud-native software stack that abstracts away low-level complexities like CUDA to streamline AI pipeline development.
#6about 2 minutes
A look inside the NIM container architecture
NIM containers bundle optimized inference tools like TensorRT-LLM and Triton Inference Server to accelerate models on specific GPU hardware.
#7about 3 minutes
How to run and interact with a NIM container
A NIM container can be launched with a simple Docker command, automatically discovering hardware and exposing OpenAI-compatible API endpoints for interaction.
#8about 2 minutes
Efficiently serving custom models with LoRA adapters
NIM enables serving multiple customized LoRA adapters on a single base model simultaneously, saving memory while providing distinct model endpoints.
#9about 3 minutes
How NIM automatically handles hardware and model optimization
NIM simplifies deployment by automatically selecting the best pre-compiled model based on the detected GPU architecture and user preference for latency or throughput.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
23:24 MIN
Deploying models with TensorRT and Triton Inference Server
Trends, Challenges and Best Practices for AI at the Edge
15:54 MIN
Deploying enterprise AI applications with NVIDIA NIM
WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA
09:43 MIN
The technical challenges of running LLMs in browsers
From ML to LLM: On-device AI in the Browser
13:15 MIN
Running on-device models with the WebLLM library
From ML to LLM: On-device AI in the Browser
03:36 MIN
The rapid evolution and adoption of LLMs
Building Blocks of RAG: From Understanding to Implementation
27:27 MIN
Matching edge AI challenges with NVIDIA's solutions
Trends, Challenges and Best Practices for AI at the Edge
19:14 MIN
Addressing data privacy and security in AI systems
Graphs and RAGs Everywhere... But What Are They? - Andreas Kollegger - Neo4j
22:29 MIN
Testing Spring AI applications with local LLMs
What's (new) with Spring Boot and Containers?
Featured Partners
Related Videos
Self-Hosted LLMs: From Zero to Inference
Roberto Carratalá & Cedric Clyburn
Your Next AI Needs 10,000 GPUs. Now What?
Anshul Jindal & Martin Piercy
LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices
Anshul Jindal
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
Aarno Aukia
WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA
Ankit Patel
Unveiling the Magic: Scaling Large Language Models to Serve Millions
Patrick Koss
Exploring LLMs across clouds
Tomislav Tipurić
Unlocking the Power of AI: Accessible Language Model Tuning for All
Cedric Clyburn & Legare Kerrison
From learning to earning
Jobs that call for the skills explored in this talk.

Senior AI Software Developer & Mentor
Dynatrace
Linz, Austria
Senior
Java
TypeScript
AI Frameworks
Agile Methodologies




Senior Software Architect - Deep Learning and HPC Communications
NVIDIA Corporation
Remote
Senior
C++
Linux
Node.js
PyTorch
+1

Data Scientist- Python/MLflow-NLP/MLOps/Generative AI
ITech Consult AG
Azure
Python
PyTorch
TensorFlow
Machine Learning


