AI MODEL TESTING COMMAND CENTER

Local vs Frontier · Benchmark Suite v4.8
CHANNEL @HomelabAI
RIG ONLINE
Summary
Speed
Quality
Cost
Hardware
Total Tests Run
0
▲ 32 this week
Avg Tokens / Sec
0 t/s
▲ 8% vs last build
Top Performer
Opus 4.8
94.2 composite
Total API Cost
$0
▼ $4.10 optimized
Local Runtime
0 hrs
RTX 3090 · 24GB
Head-to-Head Matchup
FEATURED · BUILD #248 · 2026-05-28
◤ LOCAL CHALLENGER
Qwen3.6 27B
IQ4_XS · llama.cpp · fully in VRAM
88.6/100
SPEED114 t/s
COST$0.00
LATENCY0.4s
◇ Runner-Up
VS
FRONTIER TITAN ◢
Claude Opus 4.8
API · 200K ctx · extended thinking
94.2/100
SPEED72 t/s
COST$38.40
LATENCY1.1s
★ Match Winner
Model Scorecards
Frontier
Local
Prompt Test History
CLICK ROW TO EXPAND · 12 PROMPTS
Category Winner Breakdown
QWEN 3 · OPUS 3
Model Test Archive
SHOWING ALL
Model Type Speed Coding Quality Composite Cost Date Grade
Final Testing Notes
AUTOSAVED · EDITABLE
★ Final Verdict
Opus 4.8 takes the crown,
but Qwen3.6 steals the value title.

Opus wins 3 of 6 categories on raw capability, but Qwen closes 94% of the quality gap at literally zero marginal cost and 1.6× the speed. For 90% of daily coding + content tasks, local wins the ROI argument decisively.

#LocalLLM #RTX3090 #Qwen3 #ClaudeOpus #Benchmark
Throughput Analysis
Qwen3.6 27B (Local)Claude Opus 4.8 (API)
Tokens Per Second
Completion Time (sec)
Qwen Peak t/s
128
short prompts
Opus Peak t/s
81
streaming
Fastest Completion
2.1s
Qwen · JSON parse
Avg First Token
0.4s
local advantage
Quality & Accuracy Scores
Qwen3.6 27BClaude Opus 4.8
Quality Score by Category
Capability Radar
Per-Category Quality Bars
API Usage & Cost
CLAUDE OPUS 4.8 · $15 / $75 PER MTOK
Estimated Total Spend
$38.40
12-prompt suite
Input Tokens
486K
$7.29 @ $15/Mtok
Output Tokens
414K
$31.05 @ $75/Mtok
Cost / Test
$3.20
avg across suite
Cost Comparison (Local vs API)
Cumulative API Spend
Hardware Monitor
LIVE TELEMETRY
RTX 3090 · Live Gauges
GPU Usage
98%
CUDA
VRAM
21.8
/ 24 GB
Temp
71°
Celsius
Power Draw
340
/ 350 W
GPU Utilization (last 60s)
Local Server Status
Inference Engine
llama.cpp ●
Host
proxmox-ai-01
CPU
Ryzen 9 5900X
System RAM
38.2 / 64 GB
Model Loaded
Qwen3.6-27B-IQ4_XS
Context Window
32,768 tok
GPU Layers
65 / 65 (full)
Uptime
62h 14m
API Endpoint
:8080 healthy
Fan Speed
68%