Safety Mechanisms for Artificial General Intelligence (AGI)core

AGI-Safety · Horizon Europe grant · 2025-09-01–2030-08-31

EC contribution

€1,625,000

Total cost

€1,625,000

Beneficiaries

About the data

Source: CORDIS (official EU open data), Horizon Europe. Framework HORIZON · call ERC-2025-STG · scheme HORIZON-ERC · topic ERC-2025-STG. CORDIS record →

Objective

Artificial General Intelligence (AGI) represents AI systems with human-level cognitive abilities, capable of understanding, learning, and applying knowledge across a wide range of tasks and domains. While AGI holds immense potential to revolutionize industries, its imminent arrival also poses significant threats to society. Without proper safety mechanisms, AGI could cause unintended harm, be misused by malicious actors, or act autonomously in unpredictable and dangerous ways.Our ambitious goal is to pioneer AGI safety by introducing a new paradigm grounded in cybersecurity principles. Current safety mechanisms—such as safeguards and alignment training—are proactive, serve only as the first line of defense, and are insufficient for the complex, autonomous nature of AGI. Stronger, more explicit mechanisms are essential to handle AGI use cases and mitigate their inherent risks.The new paradigm employs a layered approach: beyond proactive safety, we propose adding two additional protective layers. These layers form the novel domains of active and reactive safety, both built upon a foundation of adversarial robustness. Active safety mechanisms, such as fail safes, enable us to detect and correct harmful thoughts made by the AGI in real time and explicitly, ensuring continuous and safe operation while enabling us to perform auditing when necessary. Reactive safety mechanisms, such as kill switches, serve as a last line of defense to contain or neutralize an AGI when all other measures fail. We also propose research into making these safety mechanisms immutable, preventing adversarial bypass.Our preliminary data shows that these mechanisms are feasible and have high potential to outperform existing AI safety approaches. By fundamentally rethinking AI safety for the AGI era, this research aims to ensure we have robust safety mechanisms in place before AGI becomes a reality, while also enhancing the security and reliability of current AI systems in the interim.

Beneficiaries (1)

Organisation	Country	Role	EC contribution	SME
BEN-GURION UNIVERSITY OF THE NEGEV	IL	coordinator	€1,625,000

Defence Finance Monitor is an analytical and informational product. Grant data is official CORDIS; payment and subscription happen on DFM Analysis.

Objective

Beneficiaries (1)

Get the DFM funding briefing — free