Transformer

Architecture

Simple Definition

The neural network architecture that underpins all modern large language models, introduced by Google in 2017.

Full Explanation

The Transformer architecture, introduced in the paper 'Attention Is All You Need' (Vaswani et al., 2017), replaced previous recurrent neural networks with a self-attention mechanism that processes all tokens in parallel. This enabled training on much larger datasets with greater efficiency. Every major LLM (GPT, Claude, Gemini, LLaMA) is based on the Transformer architecture.

Example

GPT stands for Generative Pre-trained Transformer — named after the architecture it uses.

Related Terms

Attention Mechanism

The core innovation in Transformer models that allows AI to weigh the importance of different parts of input text when generating each output token.

Large Language Model (LLM)

A type of AI model trained on vast amounts of text data that can generate, summarize, translate, and reason about language.

Foundation Model

A large AI model trained on broad data at scale that can be adapted for many different downstream tasks.

Last verified: 2026-04-13← Back to Glossary