We Compare AI

Latency Heatmap by Region

Compare model response speeds (median TTFT in ms) across US East, US West, EU West, and Asia Pacific.

ModelProviderUS EastUS WestEU WestAsia Pacific
GPT-4oOpenAI320 ms410 ms520 ms780 ms
GPT-4o miniOpenAI180 ms230 ms310 ms490 ms
Claude 3.7 SonnetAnthropic290 ms380 ms460 ms710 ms
Claude 3.5 HaikuAnthropic160 ms210 ms290 ms440 ms
Gemini 2.0 FlashGoogle210 ms190 ms380 ms320 ms
Gemini 1.5 ProGoogle380 ms360 ms490 ms410 ms
Llama 3.3 70BMeta/Together260 ms290 ms440 ms620 ms
Mistral LargeMistral340 ms420 ms280 ms590 ms
Legend:
≤ 250 msFast
251–450 msModerate
> 450 msSlow

Measurement Notes

  • TTFT (Time to First Token) — All values represent median TTFT in milliseconds, measured across 50 samples per region per model. TTFT is the elapsed time from request dispatch to receipt of the first streamed token byte.
  • Infrastructure — Tests were run from AWS EC2 instances in us-east-1, us-west-2, eu-west-1, and ap-southeast-1 to simulate real cloud-hosted application latency. No VPN or proxy was used.
  • Period — Data collected Q1 2026 during business hours (09:00–17:00 local time) under typical API load. Results may vary under peak traffic or with provider infrastructure changes.
  • Llama 3.3 70B values reflect Together AI hosted inference endpoints. Self-hosted deployments may differ significantly depending on hardware and geography.