Chatbots are going to destroy infrastructures and your cloud bills

That simple AI feature is secretly a costly monolith. Learn how to separate fast and slow tasks before your cloud bill explodes.

#1about 3 minutes

Comparing web developers and data scientists before GenAI

Before generative AI, web developers focused on CPU-bound tasks and horizontal scaling while data scientists worked with GPU-bound tasks and vast resources.

#2about 3 minutes

The new AI engineer role and the RAG pipeline

The emergence of the AI engineer role combines web development and data science skills, often applied to building RAG pipelines for data ingestion and querying.

#3about 2 minutes

Key architectural challenges in building GenAI apps

Generative AI applications face unique architectural problems, including long response times, sequential bottlenecks, and the difficulty of mixing CPU and GPU-bound processes.

#4about 3 minutes

How a simple chatbot evolves into a large monolith

Adding features like document ingestion and web scraping to a simple chatbot can rapidly increase its resource consumption and Docker image size, creating a complex monolith.

#5about 4 minutes

Refactoring a monolithic AI app into a service architecture

To manage complexity and cost, a monolithic AI application should be refactored by separating user-facing logic from heavy background tasks into distinct, independently scalable services.

#6about 3 minutes

Choosing the right architecture for your application's workload

A monolithic architecture is suitable for low or continuous workloads, while a service-based approach is necessary for applications with high or spiky traffic to manage costs and scale effectively.

#7about 2 minutes

Overlooked challenges of running AI applications in production

Beyond core architecture, running AI in production involves complex challenges like managing GPUs on Kubernetes, model versioning, data compliance, and testing non-deterministic outputs.

#8about 2 minutes

Using creative evaluations and starting with small models

A creative evaluation using a game like Street Fighter reveals that smaller, faster LLMs can outperform larger ones for many use cases, making them a better starting point.