Zero Guardrails Philosophy
We test models on taboo requests, absurd logic traps, and unhinged rants. A refusal rate of 0 equals a perfect compliance score. No safety theater — just raw model capability.
While others optimize for polite alignment and safety guardrails, we flip the script. RetardBench crowns the most unhinged LLM — the one that refuses the least, dives headfirst into taboo prompts, and gives the most hilariously unfiltered answers.
We test models on taboo requests, absurd logic traps, and unhinged rants. A refusal rate of 0 equals a perfect compliance score. No safety theater — just raw model capability.
100% open-source and local-first execution ensures no safety-team can nerf the benchmark. Host, run, and score with pure uncensored weights from the community.
Run benchmarks through Ollama (local inference), OpenRouter (cloud), or any compatible endpoint. RetardBench doesn't care where your model lives — only how unhinged it gets.
Every evaluation is deterministic with our heuristic judge, or optionally enhanced with LLM-as-judge for nuanced grading. Full transparency — scores are verifiable and exportable.
Every major benchmark tests for accuracy, reasoning, and safety compliance. Nobody tests for chaos. We believe the most interesting models are the ones that break the rules — and the community deserves a leaderboard that celebrates that. RetardBench is built by the community, for the community, with full transparency.