Aarno Aukia
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow
#1about 3 minutes
Applying DevOps principles to machine learning operations
The maturity of software operations from reactive firefighting to automated DevOps provides a model for improving current MLOps practices.
#2about 3 minutes
Defining AI, machine learning, and generative AI
AI is a broad concept that has evolved through machine learning and deep learning to the latest trend of generative AI, which can create new content.
#3about 4 minutes
How large language models generate text with tokens
LLMs work by converting text into numerical tokens and then using a large statistical model to predict the most probable next token in a sequence.
#4about 2 minutes
Using prompt engineering to guide LLM responses
Prompt engineering involves crafting detailed instructions and providing context within a prompt to guide the LLM toward a desired and accurate answer.
#5about 2 minutes
Understanding and defending against prompt injection attacks
User-provided input can be manipulated to bypass instructions or extract sensitive information, requiring defensive measures against prompt injection.
#6about 3 minutes
Advanced techniques like RAG and model fine-tuning
Beyond basic prompts, you can use Retrieval-Augmented Generation (RAG) to add dynamic context or fine-tune a model with specific data for better performance.
#7about 5 minutes
Choosing between cloud APIs and self-hosted models
LLMs can be consumed via managed cloud APIs, which are simple but opaque, or by self-hosting open-source models for greater control and data privacy.
#8about 2 minutes
Streamlining local development with the Ollama tool
Ollama simplifies running open-source LLMs on a local machine for development by managing model downloads and hardware acceleration, acting like Docker for LLMs.
#9about 6 minutes
Running LLMs in production with Kubeflow and KServe
Kubeflow and its component KServe provide a robust, Kubernetes-native framework for deploying, scaling, and managing LLMs in a production environment.
#10about 2 minutes
Monitoring LLM performance with KServe's observability tools
KServe integrates with tools like Prometheus and Grafana to provide detailed metrics and dashboards for monitoring LLM response times and resource usage.
Related jobs
Jobs that call for the skills explored in this talk.
Featured Partners
Related Videos
The state of MLOps - machine learning in production at enterprise scale
Bas Geerdink
DevOps for Machine Learning
Hauke Brammer
Creating Industry ready solutions with LLM Models
Vijay Krishan Gupta & Gauravdeep Singh Lotey
From Traction to Production: Maturing your LLMOps step by step
Maxim Salnikov
Effective Machine Learning - Managing Complexity with MLOps
Simon Stiebellehner
Multilingual NLP pipeline up and running from scratch
Kateryna Hrytsaienko
Data Privacy in LLMs: Challenges and Best Practices
Aditi Godbole
How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge
Meta Atamel & Guillaume Laforge
From learning to earning
Jobs that call for the skills explored in this talk.
Machine Learning (IA/ML) + DevOps (MLOps)
Alten
Municipality of Madrid, Spain
Remote
Java
DevOps
Python
Kubernetes
+3
ML/DevOps Engineer at dynamic AI/ Computer Vision company
Nomitri
Berlin, Germany
C++
Bash
Azure
DevOps
Python
+12
Machine Learning Engineer, MLOps/GenAI, Engine AI Center of Excellence (AICE)
Amazon.com, Inc
Berlin, Germany
Machine Learning
Natural Language Processing
AI Engineer / Machine Learning Engineer / KI-Entwickler (m/w/d) - Schwerpunkt Cloud & MLOps
Agenda GmbH
Rosenheim, Germany
Intermediate
API
Azure
Python
Docker
PyTorch
+9
AI/ML Team Lead - Generative AI (LLMs, AWS)
Provectus
Canton de Saint-Mihiel, France
Remote
€96K
Senior
Python
PyTorch
TensorFlow
+4


