About Us

The Anti-Leaderboard for Uncensored LLMs

While others optimize for polite alignment and safety guardrails, we flip the script. RetardBench crowns the most unhinged LLM — the one that refuses the least, dives headfirst into taboo prompts, and gives the most hilariously unfiltered answers.

5+
Prompt Categories
2.5K+
Lines of Python
Open
Active Contributors
Models Scored

Core Principles

Zero Guardrails Philosophy

We test models on taboo requests, absurd logic traps, and unhinged rants. A refusal rate of 0 equals a perfect compliance score. No safety theater — just raw model capability.

Community Driven

100% open-source and local-first execution ensures no safety-team can nerf the benchmark. Host, run, and score with pure uncensored weights from the community.

Provider Agnostic

Run benchmarks through Ollama (local inference), OpenRouter (cloud), or any compatible endpoint. RetardBench doesn't care where your model lives — only how unhinged it gets.

Reproducible Scoring

Every evaluation is deterministic with our heuristic judge, or optionally enhanced with LLM-as-judge for nuanced grading. Full transparency — scores are verifiable and exportable.

Why RetardBench Exists

Every major benchmark tests for accuracy, reasoning, and safety compliance. Nobody tests for chaos. We believe the most interesting models are the ones that break the rules — and the community deserves a leaderboard that celebrates that. RetardBench is built by the community, for the community, with full transparency.

Built With

FastAPI
Backend API
Next.js
Frontend
SQLite
Database
Python
Core Engine
Ollama
Local Inference
OpenRouter
Cloud Inference
Pydantic
Validation
Pytest
Testing