AI Model Testing Command Center

Total Tests Run

0

▲ 32 this week

Avg Tokens / Sec

0 t/s

▲ 8% vs last build

Top Performer

Opus 4.8

94.2 composite

Total API Cost

$0

▼ $4.10 optimized

Local Runtime

0 hrs

RTX 3090 · 24GB

Head-to-Head Matchup

FEATURED · BUILD #248 · 2026-05-28

◤ LOCAL CHALLENGER

Qwen3.6 27B

IQ4_XS · llama.cpp · fully in VRAM

88.6/100

SPEED114 t/s

COST$0.00

LATENCY0.4s

◇ Runner-Up

VS

FRONTIER TITAN ◢

Claude Opus 4.8

API · 200K ctx · extended thinking

94.2/100

SPEED72 t/s

COST$38.40

LATENCY1.1s

★ Match Winner

Model Scorecards

Frontier

Local

Prompt Test History

CLICK ROW TO EXPAND · 12 PROMPTS

Category Winner Breakdown

QWEN 3 · OPUS 3

Model Test Archive

SHOWING ALL

Model ▼	Type ▼	Speed ▼	Coding ▼	Quality ▼	Composite ▼	Cost ▼	Date ▼	Grade ▼

Final Testing Notes

AUTOSAVED · EDITABLE

★ Final Verdict

Opus 4.8 takes the crown,
but Qwen3.6 steals the value title.

Opus wins 3 of 6 categories on raw capability, but Qwen closes 94% of the quality gap at literally zero marginal cost and 1.6× the speed. For 90% of daily coding + content tasks, local wins the ROI argument decisively.

#LocalLLM #RTX3090 #Qwen3 #ClaudeOpus #Benchmark

Throughput Analysis

Qwen3.6 27B (Local)Claude Opus 4.8 (API)

Tokens Per Second

Completion Time (sec)

Qwen Peak t/s

128

short prompts

Opus Peak t/s

81

streaming

Fastest Completion

2.1s

Qwen · JSON parse

Avg First Token

0.4s

local advantage

Quality & Accuracy Scores

Qwen3.6 27BClaude Opus 4.8

Quality Score by Category

Capability Radar

Per-Category Quality Bars

API Usage & Cost

CLAUDE OPUS 4.8 · $15 / $75 PER MTOK

Estimated Total Spend

$38.40

12-prompt suite

Input Tokens

486K

$7.29 @ $15/Mtok

Output Tokens

414K

$31.05 @ $75/Mtok

Cost / Test

$3.20

avg across suite

Cost Comparison (Local vs API)

Cumulative API Spend

Hardware Monitor

LIVE TELEMETRY

RTX 3090 · Live Gauges

GPU Usage

98%

CUDA

VRAM

21.8

/ 24 GB

Temp

71°

Celsius

Power Draw

340

/ 350 W

GPU Utilization (last 60s)

Local Server Status

Inference Engine

llama.cpp ●

Host

proxmox-ai-01

CPU

Ryzen 9 5900X

System RAM

38.2 / 64 GB

Model Loaded

Qwen3.6-27B-IQ4_XS

Context Window

32,768 tok

GPU Layers

65 / 65 (full)

Uptime

62h 14m

API Endpoint

:8080 healthy

Fan Speed

68%

AI MODEL TESTING COMMAND CENTER