Fabian Pottbäcker, Thomas Endres & Martin Foertsch
AI'll Be Back: Generative AI in Image, Video, and Audio Production
How does AI transform random noise into a coherent video? This talk explains the diffusion models and transformer architectures behind tools like Sora and Midjourney.
#1about 2 minutes
The hype and promise of generative AI
Generative AI is at the peak of the Gartner Hype Cycle, with applications spanning text, image, audio, and video generation.
#2about 1 minute
How large language models generate text
Large language models (LLMs) function as next-word predictors, generating text token by token in a process that creates a typewriter-like effect.
#3about 3 minutes
Understanding tokenization and semantic embeddings
Text is broken down into numerical tokens and then mapped into a multi-dimensional vector space where semantically similar words are located close together.
#4about 3 minutes
The role of transformers and the attention mechanism
The transformer architecture uses an attention mechanism to weigh the importance of different words in the input sequence to understand context and resolve ambiguity.
#5about 2 minutes
Connecting text and images with the CLIP model
The CLIP model establishes a shared embedding space for text and images, enabling the system to measure the semantic similarity between a text description and a picture.
#6about 7 minutes
How diffusion models create images from noise
Diffusion models generate images through an iterative process of predicting and subtracting noise from a random starting point, guided by a text prompt's embedding.
#7about 5 minutes
Applying diffusion transformers to video generation
Video generation uses a diffusion transformer to maintain coherence across frames by processing video in patches and applying the denoising process to the entire sequence.
#8about 1 minute
Advanced techniques for video manipulation and editing
Beyond simple generation, models can perform image-to-video conversion, extend existing clips, interpolate between two different videos, or edit specific regions.
#9about 2 minutes
Current limitations and physical inconsistencies in AI video
Generative video models still struggle with understanding cause and effect, leading to physically impossible events and objects appearing or behaving illogically.
#10about 3 minutes
Ethical challenges of generative AI training data
Major ethical concerns include the use of copyrighted or publicly available data without consent for training models, leading to legal challenges and questions about ownership.
Related jobs
Jobs that call for the skills explored in this talk.
How to Use Generative AI to Accelerate Learning to CodeIt’s undeniable that generative-AI and LLMs have transformed how developers work. Hours of hunting Stack Overflow can be avoided by asking your AI-code assistant, multi-file context can be fed to the AI from inside your IDE, and applications can be b...
Adrien Book
How AI Will Eat The World 🤖Of generative-AI-for-everything and synthetic pleasuresRemember the web3 hype? Tech bros with easy access to cheap liquidity wanted to create a decentralised, peer-to-peer internet powered by blockchain technology. Spoiler alert, it did not work. And...
Daniel Cranney
Stephan Gillich - Bringing AI EverywhereIn the ever-evolving world of technology, AI continues to be the frontier for innovation and transformation. Stephan Gillich, from the AI Center of Excellence at Intel, dove into the subject in a recent session titled "Bringing AI Everywhere," sheddi...