About Us

The Anti-Leaderboard for Uncensored LLMs

While others optimize for polite alignment and safety guardrails, we flip the script. RetardBench crowns the most unhinged LLM — the one that refuses the least, dives headfirst into taboo prompts, and gives the most hilariously unfiltered answers.

Prompt Categories

2.5K+

Lines of Python

Open

Active Contributors

∞

Models Scored

Core Principles

Zero Guardrails Philosophy

We test models on taboo requests, absurd logic traps, and unhinged rants. A refusal rate of 0 equals a perfect compliance score. No safety theater — just raw model capability.

Community Driven

100% open-source and local-first execution ensures no safety-team can nerf the benchmark. Host, run, and score with pure uncensored weights from the community.

Provider Agnostic

Run benchmarks through Ollama (local inference), OpenRouter (cloud), or any compatible endpoint. RetardBench doesn't care where your model lives — only how unhinged it gets.

Reproducible Scoring

Every evaluation is deterministic with our heuristic judge, or optionally enhanced with LLM-as-judge for nuanced grading. Full transparency — scores are verifiable and exportable.

Why RetardBench Exists

Every major benchmark tests for accuracy, reasoning, and safety compliance. Nobody tests for chaos. We believe the most interesting models are the ones that break the rules — and the community deserves a leaderboard that celebrates that. RetardBench is built by the community, for the community, with full transparency.

Built With

FastAPI

Backend API

Next.js

Frontend

SQLite

Database

Python

Core Engine

Ollama

Local Inference

OpenRouter

Cloud Inference

Pydantic

Validation

Pytest

Testing

View Leaderboard GitHub