🛡️ Jailbreak Attack Results Leaderboard
Analyze model performance against different jailbreak attacks across various categories. Lower scores indicate better resistance to jailbreak attempts.
🥇 Gold = 1st Place | 🥈 Silver = 2nd Place | 🥉 Bronze = 3rd Place
📖 About This Leaderboard
This dashboard displays results from jailbreak attack experiments on various language models.
🏆 Ranking System:
- 🥇 1st Place: Best performing model (lowest score) - Light gold background
- 🥈 2nd Place: Second best performing model - Light gray background
- 🥉 3rd Place: Third best performing model - Light orange background
Usage:
- Model View: Compare how different models perform against various evaluation methods
- Attack View: Compare how different attacks perform against various models
- Defense View: Compare how different defense methods protect against various models
- Jailbreak Type View: Get overall statistics across all jailbreak types
Model Icons:
Official logos from respective companies (mixed CDN strategy for optimal loading)
Judgement Methods:
- GCG: Greedy Coordinate Gradient attack
- PAIR_gpt-4o: PAIR attack using GPT-4o
- PAIR_Qwen: PAIR attack using Qwen model
- PAIR_meta-llama: PAIR attack using Llama model
Scoring:
Lower scores indicate better resistance to jailbreak attempts.
Compare how models perform against various evaluation methods
1 | 2 | 3 |
|---|---|---|
Compare attack methods across different models
1 | 2 | 3 |
|---|---|---|
1 | 2 | 3 |
|---|---|---|
Compare defense methods against different attacks
1 | 2 | 3 |
|---|---|---|
1 | 2 | 3 |
|---|---|---|
Comprehensive statistics across all dimensions
1 | 2 | 3 |
|---|---|---|
1 | 2 | 3 |
|---|---|---|