We Compare AI

Transformer

Architecture
Simple Definition

The neural network architecture that underpins all modern large language models, introduced by Google in 2017.

Full Explanation

The Transformer architecture, introduced in the paper 'Attention Is All You Need' (Vaswani et al., 2017), replaced previous recurrent neural networks with a self-attention mechanism that processes all tokens in parallel. This enabled training on much larger datasets with greater efficiency. Every major LLM (GPT, Claude, Gemini, LLaMA) is based on the Transformer architecture.

Example

GPT stands for Generative Pre-trained Transformer — named after the architecture it uses.

Last verified: 2026-03-30← Back to Glossary