Ekaterina Sirazitdinova
Multimodal Generative AI Demystified
#1about 2 minutes
The shift from specialized AI to multimodal foundation models
Traditional specialized AI models like CNNs are not sustainable for general intelligence, leading to the rise of multimodal foundation models trained on internet-scale data.
#2about 3 minutes
Demonstrating the power of multimodal models like GPT-4
GPT-4 achieves high accuracy on zero-shot tasks and shows substantial performance gains by incorporating vision, even enabling it to reason about humor in images.
#3about 7 minutes
How multimodal generative AI is transforming industries
Generative AI offers practical applications across education, healthcare, engineering, and entertainment, from personalized learning to interactive virtual characters.
#4about 2 minutes
Understanding the core concepts of generative AI
Generative AI creates new content by learning patterns from existing data using a foundation model, which is a large transformer trained to predict the next element in a sequence.
#5about 7 minutes
A technical breakdown of the transformer architecture
The transformer architecture processes text by converting it into numerical embeddings and uses self-attention layers in its encoder-decoder structure to understand context.
#6about 3 minutes
An introduction to diffusion models for image generation
Modern image generation relies on diffusion models, which create high-quality images by learning to progressively remove noise from a random starting point.
#7about 3 minutes
Fine-tuning diffusion models for custom subjects and styles
Diffusion models can be fine-tuned on a small set of images to generate new content featuring a specific person, object, or artistic style.
#8about 5 minutes
The core components of text-to-image generation pipelines
Text-to-image models use a U-Net architecture to predict noise and a variational autoencoder to work efficiently in a compressed latent space.
#9about 3 minutes
Using CLIP to guide image generation with text prompts
Models like CLIP align text and image data into a shared embedding space, allowing text prompts to guide the diffusion process for controlled image generation.
#10about 3 minutes
Exploring advanced use cases and Nvidia's eDiff-I model
Image generation enables applications like synthetic asset creation and super-resolution, with models like Nvidia's eDiff-I focusing on high-quality, bias-free results.
Related jobs
Jobs that call for the skills explored in this talk.
Featured Partners
Related Videos
Building Products in the era of GenAI
Julian Joseph
What do language models really learn
Tanmay Bakshi
How AI Models Get Smarter
Ankit Patel
In the Dawn of the AI: Understanding and implementing AI-generated images
Timo Zander
The AI Elections: How Technology Could Shape Public Sentiment
Martin Förtsch, Thomas Endres
ChatGPT: Create a Presentation!
Markus Walker
Creating Industry ready solutions with LLM Models
Vijay Krishan Gupta & Gauravdeep Singh Lotey
Develop AI-powered Applications with OpenAI Embeddings and Azure Search
Rainer Stropek
From learning to earning
Jobs that call for the skills explored in this talk.
GENERATIVE AI Researcher
Ikerlan
Municipality of Bilbao, Spain
Keras
Docker
PyTorch
TensorFlow
Machine Learning
AI/ML Team Lead - Generative AI (LLMs, AWS)
Provectus
Canton de Saint-Mihiel, France
Remote
€96K
Senior
Python
PyTorch
TensorFlow
+4
User Empowerment Engineer | AI Vertical Solutions
Neural Concept
Lausanne, Switzerland
Python
Machine Learning
ML/DevOps Engineer at dynamic AI/ Computer Vision company
Nomitri
Berlin, Germany
C++
Bash
Azure
DevOps
Python
+12





