About the Lab

Four very different machines, all running the same model— Qwen3-Coder-Next, an 80B mixture-of-experts. Pick any of them in the Chat or Race demos and feel the difference. Same model, same prompt; the hardware is the variable. Here's what's under the hood — hover any number to learn how to read it.

Blackwell Tower

Discrete data-center GPUs

offline

The heavyweight. Dedicated GPUs, the fastest memory, built to serve a crowd.

Chip2× NVIDIA RTX PRO 6000 BlackwellThe processor doing the work. Discrete data-center GPUs.

Memory96 GB GDDR7 (per GPU)How much model + context can fit. Dedicated VRAM, separate from system RAM. More memory = bigger models / longer context.

Mem bandwidth~1.8 TB/sGDDR7 on a discrete GPU — by far the fastest memory here. This is the main reason this box generates tokens several times faster than the others.

Serving stackvLLMA production serving engine with continuous batching — it stays fast even with 64+ people at once, where a laptop server would queue.

M5 MacBook Pro

Laptop SoC · unified memory

offline

A laptop punching way above its weight thanks to fast unified memory.

ChipApple M5 MaxThe processor doing the work. Laptop SoC · unified memory.

Memory128 GB unifiedUnified memory: the CPU and GPU share one big pool. 128 GB is enough to hold an 80B model — on a laptop.

Mem bandwidth~545 GB/sFast unified memory (LPDDR5x). The highest bandwidth of the unified-memory machines — which is why a laptop keeps up surprisingly well.

Serving stackllama.cpp · Metalllama.cpp running on Apple's Metal GPU API. Great for single users; fewer parallel slots than vLLM.

DGX Spark

Grace-Blackwell dev box · unified memory

offline

A tiny Grace-Blackwell box — enormous memory, modest bandwidth.

ChipNVIDIA GB10 Grace BlackwellThe processor doing the work. Grace-Blackwell dev box · unified memory.

Memory128 GB unifiedUnified LPDDR5x shared by the Grace CPU and Blackwell GPU. Huge capacity for the size of the box.

Mem bandwidth~273 GB/sLots of memory, but modest bandwidth — so it fits the model easily yet decodes slower than its bigger sibling. Capacity ≠ speed.

Serving stackllama.cpp · CUDAllama.cpp on CUDA. Recent builds added support for this model's hybrid (Gated-DeltaNet) architecture on the GB10.

Strix Halo

x86 APU · unified memory

offline

An x86 mini-PC APU running the very same 80B model.

ChipAMD Ryzen AI MAX+ 395 (Radeon 8060S)The processor doing the work. x86 APU · unified memory.

Memory128 GB unifiedUnified LPDDR5x shared by the Zen 5 CPU and RDNA 3.5 GPU. Same 80B model fits comfortably.

Mem bandwidth~256 GB/sSimilar bandwidth class to the Spark. Plenty of room for the model; bandwidth is what caps how fast it can talk.

Serving stackllama.cpp · Vulkanllama.cpp on the Vulkan backend, driving the integrated Radeon GPU on an x86 mini-PC.

How to read the numbers

Time to first token (TTFT): How long from hitting send to the first word. It's mostly the queue plus reading your prompt (“prefill”). Low TTFT feels instant.
Tokens / second: How fast it writes once it gets going. This is usually capped by memory bandwidth— the chip re-reads the model's weights for every token, so faster memory = more tokens/sec.
Memory bandwidth: The single best predictor of single-user speed here. The Blackwell Tower's ~1.8 TB/s vs the unified boxes' ~250–550 GB/s is why it's several times faster — same model, different memory.
Unified vs discrete memory: The laptop, Spark and Strix share one memory pool between CPU and GPU (cheap, huge, runs an 80B model anywhere). The Tower has dedicated GPU VRAM — smaller but far faster.