We Compare AI

Attention Mechanism

Architecture
Simple Definition

The core innovation in Transformer models that allows AI to weigh the importance of different parts of input text when generating each output token.

Full Explanation

Self-attention allows each token in a sequence to 'pay attention' to every other token when computing its representation. This enables models to capture long-range dependencies in text — understanding that 'it' in a sentence refers to something mentioned several paragraphs earlier. Multi-head attention runs this process in parallel multiple times to capture different types of relationships.

Last verified: 2026-03-30← Back to Glossary