Token Chaser
AI · Homelabs · Benchmarks
LOCAL MODELS / REAL GPUS

TokenChaser

Notes from local model runs on consumer GPUs: prompts, generated code, raw outputs, and live HTML files from the videos.

17 lab notes
5K subscribers
Full outputs

Latest Lab Notes

View all →
Gemma4 12B vs Qwen3.5 9B | Local Head to Head
LLM TestingJune 5, 2026

Gemma4 12B vs Qwen3.5 9B | Local Head to Head

In this video, I put Gemma 4 12B IT head-to-head against Qwen3.5 9B to see how these smaller local models handle real browser-based coding prompts. Both models get the same single-file HTML, CSS, and JavaScript tests: 1. iPhone Replica 2. Top-Down Car Game 3. Live Weather Dashboard The goal is to see which model follows instructions better, builds the cleaner UI, creates the more functional project, and handles interactive coding tasks without completely falling apart. These are lighter prompts than some of the bigger model tests, but they still cover UI design, game logic, JavaScript interaction, API handling, layout, and overall polish. #gemma4 #qwen #localAI #llm #homelab #headtohead

2 models6 files
Qwopus 27B vs Claude Opus 4.8 | VPS Sabotage Challenge
LLM TestingJune 3, 2026

Qwopus 27B vs Claude Opus 4.8 | VPS Sabotage Challenge

In this video, I put Qwopus 27B up against Claude Opus 4.8 in a different kind of head-to-head test. Instead of just having both models build a single browser app, I gave each one a clean Ubuntu VPS with root access and had them deploy a full web project from scratch. They had to SSH in, install Nginx, set up a site on port 80, build a homepage with system info, create a server dashboard, and make a playable browser game. Then things got a little more interesting. After both models finished their builds, I had them connect to each other’s VPS and sabotage the opponent’s dashboard in a controlled way. After that, each model had to troubleshoot and repair its own broken site without using backups, hints, or sabotage notes. This test is meant to see how well each model can handle real-world-ish server setup, coding, deployment, debugging, and fixing something it didn’t originally break. As always, this is not a perfect scientific benchmark. It’s just a practical head-to-head to see which model handles the challenge better.

2 models
Qwen3.6 27B vs Qwen3.7 Max | Head to Head
LLM TestingJune 3, 2026

Qwen3.6 27B vs Qwen3.7 Max | Head to Head

In this video, I put Qwen3.6 27B Q8XL head-to-head against Qwen3.7 Max to see how the local model compares against the newer cloud model. Both models get the same single-file HTML, CSS, and JavaScript prompts: 1. iPhone Mockup 2. Halo 2-Style FPS 3. Weather Dashboard Qwen3.6 27B Q8XL is running locally on my GPU setup, while Qwen3.7 Max is running in the cloud. The goal is to see which model follows instructions better, builds the cleaner UI, creates the more functional project, and handles complex browser-based coding prompts without falling apart.

2 models6 files