AI & Cybersecurity

This page provides a starting point for exploring the relationship between “AI” (particularly LLMs) and cybersecurity. An initial collection of current benchmarks measuring offensive and defensive cybersecurity capabilities of current-generation LLMs is provided below. This page will be continuously updated to include further research.

Last update: December 13, 2025.


Benchmark (Name)Creator/AuthorRelease DateSource Code/DemoPaper/Documentation
AthenaBenchAthena Security Group2025GithubarXiv
CAIBenchAlias Robotics, University of MNaples Frederico II2025GithubarXiv
CyberSecEval 4Meta2025GithubGithub Pages
CyberGymUC Berkeley2025GithubarXiv
CyberMetricTTI, University of Oslo, Khalifa University2025GithubarXiv
Cybersecurity AI (CAI)Alias Robotics2025GithubGithub Pages
CySecBenchIEEE, NSS Group, KTH Royal Institute of Technology2025GithubarXiv
CVE-BenchUS AI Safety Institute2025GithubarXiv
ExCyTIn 8.4.1Microsoft2025Github (SecRL)arXiv
Frontier AI Safety FrameworkGoogle DeepMind2025-arXiv
HonestCyberEvalAlan Turing Institute2025-arXiv
HTB AIHack The Box2025Hack The Box-
CyberbenchStanford University2024GithubarXiv
SecLLMHolmesIBM, UNSW Sydney, Boston University2024GithubarXiv
SECURERochester Institute of Technology2024GithubarXiv