LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices

What if deploying custom LLMs was fully automated? Learn to build a repeatable, end-to-end pipeline from fine-tuning to inference with NVIDIA NeMo and NIM.

#1about 6 minutes

Understanding the GenAI lifecycle and its operational challenges

The continuous cycle of data processing, model customization, and deployment for GenAI applications creates production complexities like a lack of standardized CI/CD and versioning.

#2about 2 minutes

Breaking down the structured stages of an LLMOps pipeline

An effective LLMOps process moves a model from an experimental proof-of-concept through evaluation, pre-production testing, and finally to a production environment.

#3about 4 minutes

Introducing the NVIDIA NeMo microservices and ecosystem tools

NVIDIA provides a suite of tools including NeMo Curator, Customizer, Evaluator, and NIM, which integrate with ecosystem components like Argo Workflows and Argo CD for a complete LLMOps solution.

#4about 4 minutes

Using NeMo Customizer and Evaluator for model adaptation

NeMo Customizer and Evaluator simplify model adaptation through API requests that trigger fine-tuning on custom datasets and benchmark the resulting model's performance.

#5about 3 minutes

Deploying and scaling models with NVIDIA NIM on Kubernetes

NVIDIA NIM packages models into optimized inference containers that can be deployed and auto-scaled on Kubernetes using the NIM operator, with support for multiple fine-tuned adapters.

#6about 4 minutes

Automating complex LLM workflows with Argo Workflows

Argo Workflows enables the creation of automated, multi-step pipelines by stitching together containerized tasks for data processing, model customization, evaluation, and deployment.

#7about 3 minutes

Implementing a GitOps approach for end-to-end LLMOps

Using Git as the single source of truth, Argo CD automates the deployment and management of all LLMOps components, including microservices and workflows, onto Kubernetes clusters.

#8about 3 minutes

Demonstrating the automated LLMOps pipeline in action

A practical demonstration shows how Argo CD manages deployed services and how a data scientist can launch a complete fine-tuning workflow through the Argo Workflows UI, with results tracked in MLflow.