Roberto Carratalá, Cedric Clyburn

Aug 20, 2025 • World Congress 2025

Self-Hosted LLMs: From Zero to Inference

Stop sending your private data to third-party AI. This talk shows you how to self-host powerful language models for complete control and security.

#1about 3 minutes

The rise of self-hosted open source AI models

Self-hosting large language models offers developers greater privacy, cost savings, and control compared to third-party cloud AI services.

#2about 2 minutes

Key benefits of local LLM deployment for developers

Running models locally improves the development inner loop, provides full data privacy, and allows for greater customization and control over the AI stack.

#3about 3 minutes

Comparing open source tools for serving LLMs

Explore different open source tools like Ollama for local development, vLLM for scalable production, and Podman AI Lab for containerized AI applications.

#4about 3 minutes

How to select the right open source LLM

Navigate the vast landscape of open source models by understanding different model families, their specific use cases, and naming conventions.

#5about 3 minutes

Using quantization to run large models locally

Model quantization compresses LLMs to reduce their memory footprint, enabling them to run efficiently on consumer hardware like laptops with CPUs or GPUs.

#6about 1 minute

Strategies for integrating local LLMs with your data

Learn three key methods for connecting local models to your data: Retrieval-Augmented Generation (RAG), local code assistants, and building agentic applications.

#7about 6 minutes

Demo: Building a RAG system with local models

Use Podman AI Lab to serve a local LLM and connect it to AnythingLLM to create a question-answering system over your private documents.

#8about 5 minutes

Demo: Setting up a local AI code assistant

Integrate a self-hosted LLM with the Continue VS Code extension to create a private, offline-capable AI pair programmer for code generation and analysis.

#9about 4 minutes

Demo: Building an agentic app with external tools

Create an agentic application that uses a local LLM with external tools via the Model Context Protocol (MCP) to perform complex, multi-step tasks.

#10about 1 minute

Conclusion and the future of open source AI

Self-hosting provides a powerful, private, and customizable alternative to third-party services, highlighting the growing potential of open source AI for developers.

yesterday

AI Software Engineer (m/f/d)

Sunhat
Köln, Germany

Remote

Senior

13 days ago

Senior Machine Learning Engineer (f/m/d)

MARKT-PILOT GmbH
Stuttgart, Germany

Remote

Senior

7 days ago

Senior Researcher for Generative AI

Dynatrace
Linz, Austria

Senior

Featured Partners

Unveiling the Magic: Scaling Large Language Models to Serve Millions

Unveiling the Magic: Scaling Large Language Models to Serve Millions

Patrick Koss

about 2 months ago • World Congress 2025

Inside the Mind of an LLM

Inside the Mind of an LLM

Emanuele Fabbiani

about 2 months ago • World Congress 2025

Exploring LLMs across clouds

Exploring LLMs across clouds

Tomislav Tipurić

about 2 months ago • World Congress 2025

Unlocking the Power of AI: Accessible Language Model Tuning for All

Unlocking the Power of AI: Accessible Language Model Tuning for All

Cedric Clyburn, Legare Kerrison

about a year ago • World Congress 2024

Three years of putting LLMs into Software - Lessons learned

Three years of putting LLMs into Software - Lessons learned

Simon A.T. Jiménez

about 2 months ago • World Congress 2025

One AI API to Power Them All

One AI API to Power Them All

Roberto Carratalá

about 2 months ago • World Congress 2025

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

Aarno Aukia

about a year ago • WeAreDevelopers LIVE

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

Meta Atamel, Guillaume Laforge

about 6 months ago • Coffee With Developers

From learning to earning

Jobs that call for the skills explored in this talk.

Senior Backend Engineer – AI Integration (m/w/x)

1 month ago

Senior Backend Engineer – AI Integration (m/w/x)

chatlyn GmbH
Vienna, Austria

Senior

JavaScript

AI-assisted coding tools

4 days ago

AI Systems Engineer - LLM Execution

OpenNebula Systems
Municipality of Madrid, Spain

Python

4 days ago

Agentic AI Architect - Python, LLMs & NLP

FRG Technology Consulting

Intermediate

Azure

Python

Machine Learning

today

AI/ML Team Lead - Generative AI (LLMs, AWS)

Provectus
Canton de Saint-Mihiel, France

Remote

€96K

Senior

Python

PyTorch

TensorFlow

+4

today

LLM-AI Engineer | Python | Arquitecturas RAG (100% remoto)

Diverger
Municipality of Madrid, Spain

Azure

Python

Amazon Web Services (AWS)

4 days ago

AI/ML Team Lead - Generative AI (LLMs, AWS)

Provectus
Canton de Saint-Mihiel, France

Remote

€96K

Senior

Python

PyTorch

TensorFlow

+4

4 days ago

AI Evaluation Data Scientist - AI/ML/LLM - (Hybrid (Hybrid) - Barcelona

European Tech Recruit
Barcelona, Spain

Intermediate

GIT

Python

Pandas

Docker

PyTorch

+2

4 days ago

Security-by-Design for Trustworthy Machine Learning Pipelines

Association Bernard Gregory

Machine Learning

Continuous Delivery