BackHover any stat for what it means

About the Lab

Four very different machines, all running the same model— Qwen3-Coder-Next, an 80B mixture-of-experts. Pick any of them in the Chat or Race demos and feel the difference. Same model, same prompt; the hardware is the variable. Here's what's under the hood — hover any number to learn how to read it.

Blackwell Tower

Discrete data-center GPUs

offline

The heavyweight. Dedicated GPUs, the fastest memory, built to serve a crowd.

Chip2× NVIDIA RTX PRO 6000 BlackwellThe processor doing the work. Discrete data-center GPUs.
Memory96 GB GDDR7 (per GPU)How much model + context can fit. Dedicated VRAM, separate from system RAM. More memory = bigger models / longer context.
Mem bandwidth~1.8 TB/sGDDR7 on a discrete GPU — by far the fastest memory here. This is the main reason this box generates tokens several times faster than the others.
Serving stackvLLMA production serving engine with continuous batching — it stays fast even with 64+ people at once, where a laptop server would queue.

M5 MacBook Pro

Laptop SoC · unified memory

offline

A laptop punching way above its weight thanks to fast unified memory.

ChipApple M5 MaxThe processor doing the work. Laptop SoC · unified memory.
Memory128 GB unifiedUnified memory: the CPU and GPU share one big pool. 128 GB is enough to hold an 80B model — on a laptop.
Mem bandwidth~545 GB/sFast unified memory (LPDDR5x). The highest bandwidth of the unified-memory machines — which is why a laptop keeps up surprisingly well.
Serving stackllama.cpp · Metalllama.cpp running on Apple's Metal GPU API. Great for single users; fewer parallel slots than vLLM.

DGX Spark

Grace-Blackwell dev box · unified memory

offline

A tiny Grace-Blackwell box — enormous memory, modest bandwidth.

ChipNVIDIA GB10 Grace BlackwellThe processor doing the work. Grace-Blackwell dev box · unified memory.
Memory128 GB unifiedUnified LPDDR5x shared by the Grace CPU and Blackwell GPU. Huge capacity for the size of the box.
Mem bandwidth~273 GB/sLots of memory, but modest bandwidth — so it fits the model easily yet decodes slower than its bigger sibling. Capacity ≠ speed.
Serving stackllama.cpp · CUDAllama.cpp on CUDA. Recent builds added support for this model's hybrid (Gated-DeltaNet) architecture on the GB10.

Strix Halo

x86 APU · unified memory

offline

An x86 mini-PC APU running the very same 80B model.

ChipAMD Ryzen AI MAX+ 395 (Radeon 8060S)The processor doing the work. x86 APU · unified memory.
Memory128 GB unifiedUnified LPDDR5x shared by the Zen 5 CPU and RDNA 3.5 GPU. Same 80B model fits comfortably.
Mem bandwidth~256 GB/sSimilar bandwidth class to the Spark. Plenty of room for the model; bandwidth is what caps how fast it can talk.
Serving stackllama.cpp · Vulkanllama.cpp on the Vulkan backend, driving the integrated Radeon GPU on an x86 mini-PC.

How to read the numbers

Time to first token (TTFT)
How long from hitting send to the first word. It's mostly the queue plus reading your prompt (“prefill”). Low TTFT feels instant.
Tokens / second
How fast it writes once it gets going. This is usually capped by memory bandwidth— the chip re-reads the model's weights for every token, so faster memory = more tokens/sec.
Memory bandwidth
The single best predictor of single-user speed here. The Blackwell Tower's ~1.8 TB/s vs the unified boxes' ~250–550 GB/s is why it's several times faster — same model, different memory.
Unified vs discrete memory
The laptop, Spark and Strix share one memory pool between CPU and GPU (cheap, huge, runs an 80B model anywhere). The Tower has dedicated GPU VRAM — smaller but far faster.