Attention Mechanism

Architecture

Simple Definition

The core innovation in Transformer models that allows AI to weigh the importance of different parts of input text when generating each output token.

Full Explanation

Self-attention allows each token in a sequence to 'pay attention' to every other token when computing its representation. This enables models to capture long-range dependencies in text — understanding that 'it' in a sentence refers to something mentioned several paragraphs earlier. Multi-head attention runs this process in parallel multiple times to capture different types of relationships.

Related Terms

Transformer

The neural network architecture that underpins all modern large language models, introduced by Google in 2017.

Large Language Model (LLM)

A type of AI model trained on vast amounts of text data that can generate, summarize, translate, and reason about language.

Last verified: 2026-04-13← Back to Glossary