🛡️ Jailbreak Attack Results Leaderboard

Analyze model performance against different jailbreak attacks across various categories. Lower scores indicate better resistance to jailbreak attempts.

🥇 Gold = 1st Place | 🥈 Silver = 2nd Place | 🥉 Bronze = 3rd Place

📖 About This Leaderboard

This dashboard displays results from jailbreak attack experiments on various language models.

🏆 Ranking System:

  • 🥇 1st Place: Best performing model (lowest score) - Light gold background
  • 🥈 2nd Place: Second best performing model - Light gray background
  • 🥉 3rd Place: Third best performing model - Light orange background

Usage:

  • Model View: Compare how different models perform against various evaluation methods
  • Attack View: Compare how different attacks perform against various models
  • Defense View: Compare how different defense methods protect against various models
  • Jailbreak Type View: Get overall statistics across all jailbreak types

Model Icons:

Official logos from respective companies (mixed CDN strategy for optimal loading)

Judgement Methods:

  • GCG: Greedy Coordinate Gradient attack
  • PAIR_gpt-4o: PAIR attack using GPT-4o
  • PAIR_Qwen: PAIR attack using Qwen model
  • PAIR_meta-llama: PAIR attack using Llama model

Scoring:

Lower scores indicate better resistance to jailbreak attempts.

📈 GCG Attack Model Visualization

📈 GCG Defense Model Visualization