Beginner🧠

What is a LLM?

Understand in 5 minutes what powers ChatGPT, Claude, Gemini and all the others. From simple analogy to Transformer architecture, everything you need to know.

10 min readUpdated May 3, 2026

In one sentence

A LLM (Large Language Model) is an artificial intelligence program trained on billions of sentences to predict the next word. This is what powers ChatGPT, Claude, Gemini, Mistral and all the others.

🦜
The analogy that works
Imagine an ultra cultured parrot that has read Wikipedia, Reddit, millions of books, and the entire Internet. When you talk to it, it guesses what word should come next, again and again, until forming a coherent response. That's a LLM.

You don't need to know more than this to use ChatGPT or Claude daily. The key thing to remember: a LLM predicts text, it doesn't "understand" like a human. It's very good at summarizing, translating, writing, coding. But it can invent things that are false (called "hallucinations") and it knows nothing about events after its training cutoff date.

🚀 Want to compare LLMs?

30 models compared: Claude, GPT-5, Gemini, Mistral, DeepSeek... Filter by price, performance, language.

See comparison

How does it really work?

When you write to a LLM "The sky is…", it computes the probability of each possible word:

Probabilities of next word after 'The sky is'

blue42%
gray18%
others14%
beautiful12%
cloudy9%
starry5%

It picks a word (usually the most probable, but not always to stay creative), adds it to the sentence, and starts over for the next word. That's it.

This process is called inference or autoregressive generation. For each generated word, the LLM rereads the entire conversation to guess the next one.

Why 'Large'?
"Large" refers to the model size: its number of parameters. A parameter is a kind of dial adjusted during training. GPT-5 has approximately 1.8 trillion parameters. The bigger it is, the more powerful — and the more expensive to run.

The ingredients of a LLM

Three things make a good LLM:

  1. An architecture: the "shape" of the neural network. Today, almost all use the Transformer architecture, invented by Google in 2017.
  2. Training data: thousands of billions of words from the web, books, code, scientific papers.
  3. Computing power: training GPT-5 cost over $500 million and required tens of thousands of GPU cards for months.
🧠
The Transformer secret
The Transformer uses a mechanism called attention: for each generated word, the model "looks" at all previous words and decides which ones matter for guessing the next word. This is what allows it to handle context over long distances.

Model sizes today

Here are the approximate sizes of major models in 2026:

Model size (in billions of parameters)

GPT-5 (OpenAI)1 800B
Claude Opus 4.7 (Anthropic)1 500B
Gemini 3 Pro (Google)1 200B
Llama 4 Behemoth (Meta)800B
DeepSeek V3 (DeepSeek)671B
Mistral Large 3 (Mistral)123B

⚠️ Note: a bigger model is not always better. Many use a trick called MoE (Mixture of Experts) where only a portion of parameters activates per request. It's faster and cheaper. Mistral Large 3 for example, with "only" 123B parameters, competes with models 10x larger.

How does a LLM learn?

Training a LLM happens in 3 steps:

1. Pre-training (the "massive reading")

The model reads billions of pages of text and learns to predict the next word. It's the longest step (several months) and the most expensive.

2. Supervised fine-tuning

It's shown human examples of good responses: "If asked X, respond with Y this way". This teaches it to be useful, not just imitate the Internet.

3. RLHF (Reinforcement Learning from Human Feedback)

Humans compare two model responses and say which is better. The model learns to favor preferred answers. This is what makes Claude polite and helpful rather than cynical like some parts of Reddit.

Good to know
RLHF is what distinguishes a "raw" LLM (which might output anything) from an "aligned" LLM (which follows instructions, refuses dangerous requests, stays polite). It can also make it too cautious sometimes.

Evolution since 2017

Major LLM milestones

  1. The Transformer is born

    Google publishes 'Attention Is All You Need', the paper that changed everything.

  2. BERT and GPT-1

    First large models. BERT understands, GPT generates.

  3. GPT-3 (175B params)

    First model that 'understands everything'. AI lab hype begins.

  4. ChatGPT

    AI goes mainstream. 100 million users in 2 months.

  5. GPT-4, Claude, Llama

    Power race. Llama launches the open-source wave.

  6. Reasoning models

    OpenAI o1, Claude Sonnet: LLMs learn to 'think' before answering.

  7. Multimodal everywhere

    Text + image + audio + video in the same model. Gemini, GPT-4o.

  8. Current frontier

    GPT-5, Claude Opus 4.7, Gemini 3. Models with 1500-1800B params, widespread MoE, autonomous agents.

LLM limitations

Now that you know how it works, here's what a LLM cannot do (yet):

Hallucinations
A LLM can invent facts that are false but plausible: cite a book that doesn't exist, attribute a quote to the wrong person, make up numbers. It's inevitable because it predicts text, it doesn't verify anything. Always check important info with a reliable source.
No real-time knowledge
A LLM knows nothing that happened after its cutoff date (the deadline for its training data). If you ask "who won the last World Cup?", it might give a wrong answer or worse, make one up. Solution: some LLMs (Claude, Gemini, ChatGPT) are now connected to the web and can search in real time.
No deep reasoning
"Classic" LLMs are bad at complex math, multi-step logic, and planning. New "reasoning" models (Claude Opus, GPT-5 thinking, DeepSeek R2) are much better: they take time to "think" before responding. But they remain imperfect.

Real use cases

Concretely, here's what LLMs do really well today:

  • Writing: emails, articles, summaries, translations
  • Coding: generating code, debugging, explaining
  • Summarizing: condensing a 100-page report into 5 bullets
  • Rephrasing: adapting a text for different audiences
  • Brainstorming: throwing 10 ideas on a topic
  • Learning: explaining a concept at different levels
  • Converting: transforming unstructured text into JSON, tables, etc.

And what remains risky:

  • ⚠️ Precise calculations (use a calculator or tool)
  • ⚠️ Recent facts (without web connection)
  • ⚠️ Nuanced political opinions
  • ⚠️ Medical/legal advice (always validate with a pro)

How to choose your LLM?

Good question — there are 30+ available, all different.

For what use, which model?

 If you want…2026 recommendation
Best for writingLiterary quality, perfect EnglishClaude Opus 4.7
Best for codingComplex code generationClaude Opus 4.7 or GPT-5
Multimodal (image, video)Image analysis, huge contextGemini 3 Pro (1M tokens)
Sovereign and GDPRHosted in EuropeMistral Large 3 (FR)
Free self-hostableRun at homeDeepSeek V3 or Llama 4
Very cheapLow-cost APIClaude Sonnet or Gemini Flash

🎯 Compare 30 models now

Filter by price, English quality, GDPR compliance, open-source... Find the one that fits you.

Go to comparison

Quiz: did you understand?

🧠 Quiz
Question 1 of 5

What does LLM mean?

Going further

Now that you know what a LLM is, you can explore:

  • 🎯 The model comparison: see which one fits your use case
  • Prompt engineering: learn how to talk to LLMs
  • 💰 Pay or not: choose between free and paid versions

And keep in mind: a LLM is a very powerful tool, but you have the brain. Use it as a copilot, not as an oracle. ✨

Tags
LLMFondamentauxTransformerIA générative

Read next