Run a lightweight benchmark and preview the raw scoring output from the RetardBench evaluation pipeline in real time.
Runs 3 curated sample prompts synchronously. Great for verifying provider connectivity and model capabilities.
Awaiting execution command...
Select a mode, configure your model, and hit Run.