Patrick Koss

Unveiling the Magic: Scaling Large Language Models to Serve Millions

A single short prompt can exhaust your GPU resources. Learn how a custom proxy and clever rate-limiting can serve large language models to millions of users.

Unveiling the Magic: Scaling Large Language Models to Serve Millions
Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

From learning to earning

Jobs that call for the skills explored in this talk.

AI/ML Engineer

Licorne Society
Canton of Toulouse-5, France

C++
GIT
CMake
Python
PyTorch
+2