LOCAL MODELS / REAL GPUS

TokenChaser

Notes from local model runs on consumer GPUs: prompts, generated code, raw outputs, and live HTML files from the videos.

Open Lab Notes Watch on YouTube

27 lab notes

7.3K subscribers

Full outputs

Latest Lab Notes

View all →

LLM TestingJuly 1, 2026

Ornith 35B vs Qwen 3.6 35B | Head to Head Battle

Ornith 35B Q6 vs Qwen 3.6 35B Q6. Two Qwen-family MoE models. Same prompts. Same setup. Same pressure. In this battle, I’ve got both models building a futuristic street-racing car OS, then expanding it into a deeper race-control interface, and finally pushing it into a live race simulator UI to see who actually holds up when complexity starts stacking. I’m not really interested in model-card chest puffing. I want to see who plans better, who keeps the design cleaner, who follows instructions, and who starts making bad decisions once the prompts get harder. Setup: - Ornith 35B vs Qwen 3.6 35B-A3B - llama.cpp + llama-swap - OpenCode side by side - local 2x RTX 3090 rig

2 models2 files

LLM TestingJune 24, 2026

Qwythos 9B vs Qwen3.5 9B | Local Coding Head-to-Head

Today I’m putting Qwen3.5 9B up against Qwythos 9B, a Qwen3.5-based Mythos fine tune with a much larger context window. Both models are running locally, and I’m giving them the same web coding challenge: Phase 1: Recreate an iPhone-style home screen UI using HTML, CSS, and JavaScript Phase 2: Make Phone, Messages, and Music work, then add a playable Arcade game Phase 3: Build a fancy Qwen3.7 launch website inside the phone’s Safari/browser app This is not a scientific benchmark. It’s a real-world coding test to see which small local model can create a better-looking UI, keep features working, and hold the project together as the challenge gets more complicated. Qwen3.5 9B vs Qwythos 9B. Regular Qwen versus the Mythos fine tune. Let’s see which one handles the challenge better. More local AI tests and projects: https://tokenchaser.net #LocalLLM #Qwen #Qwen3 #Qwythos #WebCoding #AIcoding #OpenSourceAI #LocalAI #LLM #TokenChaser

2 models2 files

LLM TestingJune 18, 2026

I Made Fusion and Qwen3.6 27B Build the Same Web App

I put OpenRouter Fusion and Qwen3.6 27B head-to-head and gave them the exact same prompt: build the same web app from scratch. Same goal. Same constraints. Same phased build. Very different results. In this video, I compare how a multi-model AI committee stacks up against a single 27B model when the task is actual software delivery, not just talking about code. The project was a real web app built in phases on fresh Linux VPSes, with each model responsible for turning the prompt into something usable. This wasn’t about benchmark scores or cherry-picked one-liners. I wanted to see which one could actually plan, build, adapt, and ship. For all prompts, code outputs, and info about the video, visit: https://tokenchaser.net Drop a comment with which models you want to see go head-to-head next.

2 models

Resources

View all →

QuantEval

Independent Open-Weights Benchmarking and Quantization Analysis of AI Models

OpenCode

Open-source AI coding agent for terminal, IDE, and desktop workflows. Useful for testing local and cloud models on real coding tasks.

llama.cpp

Run LLMs locally with GGUF models. Great for testing quantized models, CPU/GPU inference, and squeezing performance out of consumer hardware.

Hugging Face Models

Browse, download, and compare open models, fine-tunes, datasets, GGUF uploads, model cards, and community releases.

Ollama

Simple way to run local models with quick installs, easy model pulls, and a local API for apps and tools.