Pareto’s AI Summer Basics

Do you feel like you don’t understand at all how current AI setups work?

Here’s Pareto’s AI Summer Basics to get you back on track and even further if the topic is of interest. Let’s take a closer look at LLMs, large language models.

Start by going outside or staring at a lake. Everything is easier with a refreshed mind. Personally, I focus mainly on this. We’ll get back to this in August.

Still here? Let’s start with the basics. How does an LLM work? Read the Arstechnica(1) article on the subject. Go and read that article, and then come back. Hopefully, you’ll have some understanding of basic concepts like vectors, transformers, transformer sub-components such as attention mechanisms, i.e., what relates to what, next word probability calculation (feed-forward) which computes the probabilities for the next word, etc. If any part was unclear, go to ChatGPT, Gemini, Copilot, etc., and ask for clarification. The models are quite good at explaining their own workings.

Simplified, language models are created by sifting through millions and millions of pages of text, images, and audio to form a massive network of connections. Think of the “everything is connected” meme guy(2) with a flip chart. The LLM is that person and the chart. On the flip chart, every word is somewhere, and near each word, there are related words.

When you ask something, for example, “Pekka plays dog. What does Pekka do?” What happens? (Dog is a Finnish card game) The language model begins, through several transformer layers, to deduce - essentially searching for the most probable next word - what is being asked and what is desired in return. One layer ensures correct spelling. Another ensures that it’s talking about dogs. A third links the word “plays” with “dog”, suggesting that it’s probably not talking about a living dog but something else. A fourth layer might notice that there’s a game called “dog” in Wikipedia(3). After going through many layers, the result is a single word that the model considers a likely good response to the initial input, such as “Pekka”. Then, when the word is in the answer, it is fed back into the model, and the process starts again. The model might then suggest “plays” “card game” and finally “.”.

The model doesn’t “know” English; the grammatically correct-looking sentence is just the most probable answer from a series of words. This works surprisingly well, but you cannot forget this nature, the response it provides is always based on probabilities. So, the answer can also be fiction. As in this case, ChatGPT responded, “Pekka imitates or portrays a dog playfully or in a performance.” The task description was undoubtedly vague, so was the answer.

Language models are thus models that can be compressed into a single file on your hard drive. You can even download models from Ollama(4). Instructions for running them on your own computer can be found here(5). The bigger and smarter the model, the more processing power it requires from your computer, especially the graphics card memory. Otherwise, you'll be waiting a long time for the next suggested word. However, even small models that you can run on your own computer work quite well for tasks like converting speech to text.

Technical addition: The above text omits an essential part, the tokenizer. It's worth getting acquainted with, for instance, through ChristopherGS's article(6). The article is quite technical but explains well how computers perceive words as actual parts of words (tokens). So, what does this mean with the tokenizer? Language models don’t see whole words but parts of words, and they can miss essential details.

A classic example is “how many rs are there in strawberry (*)”, but current models have specifically learned this. Developers of current models have done a lot of work to improve this, but it’s worth being aware.

Links:

Pareto’s AI Summer Basics

Want to learn more? Get in touch!

Explore more of our articles

High-Quality Data for Agentic AI Systems

What Can AI Do for Your Company?