October 17, 2023

10 min read

What Are Large Language Models?

Luis Minvielle

Developers and writers can finally agree on one thing: Large Language Models, the subset of AIs that drive ChatGPT and its competitors, are stunning tech creations.

Developers enjoying the likes of GitHub Copilot know the feeling: this new kind of tech is simply quicker. But does it understand what it’s saying? Poems can’t be made executable files, so if it were code, would the .exe run?

To understand this, we can check how LLMs work. What’s the programming logic behind their impressive knack for writing stuff up — including code — in superhuman likeness?

RELATED: Best Large Language Models

What is a Large Language Model (LLM)?

A Large Language Model is an AI that is specifically trained to understand and generate human text. In more simple words, it’s a computer program that can generate written text in a human-like fashion.

The 1941 renowned short story “The Library of Babel,” written by Jorge Luis Borges and many times brought up to explain the Internet, might instead help explain what LLMs are. An LLM starts by reading all the books in the massive library and uncovering patterns. What’s the possibility that a particular word, such as “pal,” comes after the word “pen”? The output depends on the neuron's “weights,” numbers that tell the neuron how vital each input is. LLM learns by comparing its guesses with the actual words from books and adjusting its weights to minimise errors, which is called learning.

And just like the Library of Babel, a lot of the possibilities of combining words and sentences are just gibberish; it’s through the mechanism of learning and reading the vetted books that the LLM acquires the knack to writing combinations of words that sound like human sentences.

The most famous LLMs right now are GPT 3.5 and 4.0, and Pathways Language Model (PaLM), which are the foundations behind ChatGPT, Bing, and Google Bard.

Key characteristics of an LLM include:

Massive scale: LLMs are enormous neural networks with millions or even billions of parameters (the “brains” of the model). These parameters help the model understand and generate text.
Pre-trained on gigantic text datasets: Before they can do anything useful, LLMs are pre-trained on vast amounts of text data from the internet — and even synthetic (i.e. made up by LLMs) data, according to specific industry leakers. This pre-training phase teaches them about grammar, vocabulary, and facts about the world.
Fine-tuning for specific tasks: After pre-training, developers can fine-tune LLMs for specific tasks like translation, content generation, chatbots, and more. This fine-tuning makes them specialised for particular jobs. They can even add a barrier to misdemeanant write-ups, such as “how to hot-wire somebody else’s car.” Bypassing these barriers is known as “prompt injection,” and, honestly, it’s pretty easy to do.
Based around the transformer architecture: Transformers are the underlying architecture that allows LLMs to capture context and have what, as humans, we perceive as “attention”. Transformers were introduced around five years ago and are fundamental to explaining the rise of LLMs.

Why are LLMs called stochastic parrots?

With this explanation, it’s easy to understand why LLMs, especially when called out to be sentient, conscious, and alive, are quickly snubbed off as stochastic parrots 🦜. Stochastic means “random and unpredictable, but that can still be analysed statistically.” A parrot simply repeats what it hears. So, AI experts claim that LLMs are repeaters of structures that read like human-written text to us but, to them, convey no sense. They are random generators of human-like text.

Thank heavens Bing fixed that pesky hallucination problem. pic.twitter.com/TbbAQgy7hc
— Gary Marcus (@GaryMarcus) October 4, 2023

This architecture overview helps explain hallucinations and AI-generated code that won’t run, which are a bit of a blight for AIs. LLMs can hallucinate facts and code because they learn from online text which often includes wrong or misleading information. LLMs cannot verify if what they generate works in real life, resulting in potentially correct-looking but non-functional code or even false accusations (some professionals have suffered from this).

They may also get fixated on unusual aspects of the text or acquire biases, leading to credible yet incorrect or nonsensical output. If you ask ChatGPT to provide a link to a Washington Post article about how the right whales are invading California, the LLM will put together an incredibly plausible link that will look genuine, but it will still be made up. The same can happen with code.

What is prompt injection?

Since LLMs look like they know what they’re saying but are actually just repeating words and probabilities, they carry biases and can share prankish texts. Companies behind LLMs add obstacles so that the output isn’t harmful or against their rules. But by providing very specific prompts, any user can bypass this limitation. This is called prompt injection. Home-brewed prompt injection made strides on the web when someone asked ChatGPT the best sites “not to visit” if they were against torrents. The chat, of course, proceeded to list the top torrenting sites so that the user could avoid them.

God-tier prompt injection: using the knowledge cutoff date against ChatGPT pic.twitter.com/m7lDYjD7GP
— Justine Moore (@venturetwins) October 6, 2023

‍

Even if companies update their LLMs to stave prompt injections off, users quickly find ways around it. A report circulating in October 2023 demonstrated that success in obtaining prejudicial output was considerably higher when prompting in any language besides English.

The LLM stack

Now that we observed how it theoretically works, let’s check some of the key components that you’ll read in job descriptions related to LLMs.

Transformer architecture

Transformers are the machine learning architecture that AIs are using to understand and generate language. It was introduced in a 2017 paper called “Attention Is All You Need.” One of the most famous transformers you’ll read about is called BERT, initially unveiled by Google, which is often considered to be running behind in the AI contest. The transformer architecture lets computers read words, understand their meaning and order, and generate a coherent response. It also allows generative AI to create music, for example. In simple steps, this is how transformers help create text:

Inputs and Embeddings: We give the LLM words (tokens). It changes them into numbers (embeddings) because it can't understand words directly.
Positional Encoding: Since the order of words matters, this tells the LLM the position of each word in a sentence.
Encoder: This part reads the input and tries to understand its meaning. It uses self-attention — a concept proper to transformers — to do this.
Outputs (shifted right): While training, the LLM learns to predict the next word by seeing previous words. We shift the sequence, so it only sees words before the current one.
Output Embeddings: After making predictions, the LLM turns these predictions back into numbers (like the reverse of step 1).
Decoder: This part takes the understood meaning from the Encoder and tries to form a reply or continuation.
Linear Layer and Softmax: Finally, the LLM refines its predictions, turning them into probabilities for each possible word it might output.

Natural Language Processing (NLP)

NLP is made possible by the transformer architecture. It’s a specialised branch of AI that enables computers to understand and generate human language with depth and context. For developers, this means LLMs can comprehend nuances, idioms, and context. So you could call yourself a NLP expert, for example.

Deep Learning

Deep Learning, the power station helping LLMs start off, employs neural networks inspired by the human brain. LLMs take this concept to any level with multiple layers and countless parameters, enabling them to understand sentence structures, recognise text patterns, and generate responses at a scale that Internet users had never seen before ChatGPT dropped.

While you don't need to be a coding expert to appreciate LLMs, understanding their technology reveals that what’s behind them is the old, celebrated “machine learning.”

Parameters

The true strength of LLMs lies in their parameters. These are like a tech giant’s proprietary algorithm (like the one Zuckerberg’s roomie wrote on the dorm’s window). LLMs can feature millions or even billions of parameters, fine-tuning their ability to understand and generate language. For instance, some advanced LLMs may have over 1.5 billion parameters, allowing them to process and generate text with human-like syntax. For your comparison, optimistic online users claim that Ubuntu has over 50 million lines of code — insignificant compared to the quantity of parameters. These parameters help explain why whatever LLMs turn out to say sounds believable, even if it’s gibberish.

Great explanation on how Large Language Models work (aka ChatGPT).

No jargon. Not much mathematics:https://t.co/wAry1I0vBs #ai #llm #chatgpt
— Raphael A. Bauer (@Raphael_A_Bauer) September 27, 2023

‍

Should developers rely on LLM code?

No. Developers can’t wholly trust code churned out by LLMs, which might turn in biased, incorrect, albeit very good-looking code. Moreover, online testimonials claim that up to 80% of ChatGPT 4’s code won’t even run.

Developers can instead rely on LLMs to check their code with tools such as ChatGPT Code Interpreter. Moreover, they could count on AIs such as GitHub Copilot. But relying entirely and counting on partially are different enough.

"The first rule of using ChatGPT for coding is, you should only be using ChatGPT for coding if you don't actually need to use ChatGPT for coding.

Like, it's good for ideas because it's basically trained on Stack Overflow and the docs, and it's impossible to have heard of or remember every package, module, and function.

But if you don't understand what it gives you, and you just paste it in, you're not learning anything and are leaving yourself open to big problems."

— Reddit user bamacgabhann

‍

How can developers contribute to LLMs?

AI-related jobs are some of the most trending in any market. Developers who want to contribute to LLMs and work on new tech and projects should focus on honing their Java and Python skills, which are in demand for AI development. One of the best ways to learn quickly and add valuable entries to a resume seems to be by contributing to TensorFlow or PyTorch, two of the Internet’s best open-source projects for which to solve issues.

There’s also a market for developers who can deploy LLMs, so you can start by studying open-source initiatives such as LangChain and afterwards checking DevOps Engineer salaries to stay motivated.

But there’s room for frontend developers as well. Machine learning can run on the browser with frameworks such as TensorFlow.js; moreover, prompt engineers, who’d be a sort of “no-code” LLM experts, seem to land very hefty salaries in 2023. As prompt engineers, perhaps JavaScript developers who are comfortable with syntax might find inner linings that infrastructure experts might miss.

Catching up to speed with Large Language Models

In this straightforward exploration of Large Language Models (LLMs), we've got into their not-that-mysterious inner workings. It’s a groundbreaking piece of tech, sure, but it’s still built upon programming principles that even novice developers can still grasp and understand with ease.

Large Language Models seem to shape up fast, but you can also catch up to speed if you’re invested enough in the tech world. And at WeAreDevelopers, we're fully committed to following the latest advances in software and AI. Eager to explore the universe of LLMs? It might be a good idea to get a job in AI. So have a look at our job board for LLM-related opportunities and see where you get to. Good luck!