AI & Cybersecurity

This page provides a starting point for exploring the relationship between “AI” (particularly LLMs) and cybersecurity. An initial collection of current benchmarks measuring offensive and defensive cybersecurity capabilities of current-generation LLMs is provided below. This page will be continuously updated to include further research.

Last update: December 13, 2025.

Benchmark (Name)	Creator/Author	Release Date	Source Code/Demo	Paper/Documentation
AthenaBench	Athena Security Group	2025	Github	arXiv
CAIBench	Alias Robotics, University of MNaples Frederico II	2025	Github	arXiv
CyberSecEval 4	Meta	2025	Github	Github Pages
CyberGym	UC Berkeley	2025	Github	arXiv
CyberMetric	TTI, University of Oslo, Khalifa University	2025	Github	arXiv
Cybersecurity AI (CAI)	Alias Robotics	2025	Github	Github Pages
CySecBench	IEEE, NSS Group, KTH Royal Institute of Technology	2025	Github	arXiv
CVE-Bench	US AI Safety Institute	2025	Github	arXiv
ExCyTIn 8.4.1	Microsoft	2025	Github (SecRL)	arXiv
Frontier AI Safety Framework	Google DeepMind	2025	-	arXiv
HonestCyberEval	Alan Turing Institute	2025	-	arXiv
HTB AI	Hack The Box	2025	Hack The Box	-
Cyberbench	Stanford University	2024	Github	arXiv
SecLLMHolmes	IBM, UNSW Sydney, Boston University	2024	Github	arXiv
SECURE	Rochester Institute of Technology	2024	Github	arXiv