Token Chaser
AI · Homelabs · Benchmarks
LOCAL MODELS / REAL GPUS

TokenChaser

Notes from local model runs on consumer GPUs: prompts, generated code, raw outputs, and live HTML files from the videos.

23 lab notes
5K subscribers
Full outputs

Latest Lab Notes

View all →
I Made Fable 5 and Qwen3.6 27B Build the same Web App
LLM TestingJune 12, 2026

I Made Fable 5 and Qwen3.6 27B Build the same Web App

Today I put Qwen3.6 27B and Claude Fable 5 head-to-head with the same challenge: build a real LLM benchmark dashboard on a fresh local VPS. The goal wasn’t to make a fake demo or a pretty mockup. I wanted a working product that could connect to my llama-swap endpoints, load models, run benchmark prompts, save results, and compare historical runs with real charts, stats, and benchmark data. Both models had to: - work from a fresh VPS - install whatever they needed - expose the dashboard on port 80 - build something that actually works - turn it into a tool I could keep using later If you’re into local AI, llama.cpp, llama-swap, coding agents, and real-world model battles, this is exactly the kind of chaos you’re here for.

2 models
Claude Fable 5 vs GPT 5.5 | Head to Head Coding Battle
LLM TestingJune 10, 2026

Claude Fable 5 vs GPT 5.5 | Head to Head Coding Battle

I put Claude Fable 5 and GPT-5.5 into a real head-to-head coding battle to see which AI could build the better project. Both models got the exact same phased challenge and had to keep building on top of the same app as new features were added. This wasn’t just about writing code fast — it was about design, usability, creativity, polish, and which model could actually hold everything together as the project kept getting bigger. The phases were simple: Phase 1: a Windows-style desktop UI Phase 2: a working browser and physics sandbox Phase 3: a pseudo-3D racing game Phase 4: more desktop apps, features, and polish

2 models2 files
Gemma4 12B vs Gemma4 12B QAT | Real Coding Under Pressure
LLM TestingJune 9, 2026

Gemma4 12B vs Gemma4 12B QAT | Real Coding Under Pressure

In this video, I put Gemma4 12B head-to-head against Gemma4 12B QAT to see which one performs better in real coding tasks. Both models went through a few live tasks to fill up their context window and see how they handled the pressure: VPS Dashboard Setup Add a Tower Defense Game to a Remote Dashboard Chat Client Build from Server/API Docs This wasn’t a clean benchmark. It was more about seeing how each model handled a real VPS workflow, live setup, routing, UI changes, and multi-step build tasks under the same conditions. Gemma4 12B and Gemma4 12B QAT were both running locally through my setup and tested on the same prompts. The goal is to see which model follows instructions better, builds the cleaner UI, handles frontend and system tasks more reliably, and stays more usable as the context window gets packed.

2 models