As artificial intelligence continues to advance at a remarkable pace, governments worldwide have recognized the need for specialized institutions to address the safety challenges posed by increasingly capable AI systems. The emergence of AI Safety Institutes (AISIs) represents one of the first institutional models of AI governance that different governments have adopted in a similarly shaped way.[1] These institutes aim to evaluate and ensure the safety of the most advanced artificial intelligence models, also called frontier AI models. This post explores the fundamental characteristics, core functions, and challenges of AISIs, with a particular focus on the „first wave” of such institutes established by the UK, US, and Japan, while also examining the broader landscape of AI governance frameworks.
The Rise of AI Safety Institutes
The origins of AISIs can be traced back to November 2023, when the US and the UK announced the creation of their respective institutes during the AI Safety Summit at Bletchley Park.[2] The conception of this model is typically attributed to the UK, which initially established a „Foundation Model Taskforce” in April 2023, later renamed the „Frontier AI Taskforce” in September 2023, transformed into the AI Safety Institute, finally renamed as the AI Security Institute.[3] Japan followed suit in February 2024, establishing its own AISI under its Council for Science, Technology and Innovation.[4]
Since then, several other jurisdictions have established similar institutions. In May 2024, during the AI Seoul Summit, international leaders agreed to form a network of AI Safety Institutes, comprising institutes from the UK, the US, Japan, France, Germany, Italy, Singapore, South Korea, Australia, Canada, and the European Union.[5] This International Network of AI Safety Institutes was officially launched in November 2024 at a meeting in San Francisco, marking a significant advancement in fostering global cooperation on AI safety.[6]
Fundamental Characteristics of First-Wave AISIs
First-wave AI Safety Institutes share several fundamental characteristics that define their institutional identity:
Safety-Focused
AISIs are primarily concerned with the safety of advanced AI systems, particularly those at the „frontier” of AI development. The Bletchley Declaration, signed by all jurisdictions that participated in the first AI Safety Summit, emphasized that AI should be created, implemented, and utilized in a way that is safe, human-centered, reliable, and accountable. This focus on safety is especially relevant for managing risks related to the most capable general-purpose AI models and specific narrow AI that could exhibit harmful capabilities.
Governmental
AISIs are governmental institutions, which provides them with the authority, legitimacy, and resources necessary to address AI safety on a national and global scale. For example, the UK AISI is part of the Department for Science, Innovation and Technology, the US AISI operates under the National Institute of Standards and Technology within the Department of Commerce, and Japan’s AISI is housed within the Information-technology Promotion Agency under the Ministry of Economy, Trade and Industry.
Technical
AISIs have a strong focus on technical expertise, centered around technical professionals with a core focus on the substance of AI safety. This technical orientation has enabled some AISIs, particularly those in the US and UK, to attract top talent from industry, academia, and civil society. For instance, the UK AISI has recruited influential AI researchers such as Paul Christiano, who pioneered reinforcement learning with human feedback, and Geoffrey Irving, a key contributor to safety areas such as ‘AI safety via debate’ and interpretability.
Core Functions of AI Safety Institutes
First-wave AISIs perform three core functions that contribute to their institutional identity: research, standards development, and cooperation.
Evaluations: The Central Activity
At the heart of AISIs’ work are evaluations – techniques and procedures that test AI systems across tasks to understand their behavior and capabilities. Safety evaluations specifically test particular AI capabilities that present relevant risks, including cyber, chemical, biological misuse, autonomous capabilities, and the effectiveness of safeguards.
The UK AISI has been particularly active in this area, conducting evaluations that cover whether models could potentially be used to facilitate cyber-attacks, provide expert-level knowledge in chemistry and biology that could be used for harmful purposes, operate as ‘agents’ in ways that might be difficult for humans to control, and whether they were vulnerable to ‘jailbreaks’ or users attempting to bypass safeguards. The UK AISI has also open-sourced their testing framework, Inspect, to facilitate wider adoption of their evaluation methodologies.
These evaluations employ various techniques, including automated capability assessments, red-teaming (where domain experts interact with a model to test its capabilities and break model safeguards), human uplift evaluations (assessing how bad actors could use systems to carry out real-world harms), and AI agent evaluations (testing AI agents’ ability to operate semi-autonomously).
Research
AISIs conduct research to advance the „science of AI safety” by investing in new technical knowledge and tools to improve understanding of how to make advanced AI systems safer. This research is primarily empirical and action-relevant, aiming to be actionable by governments, companies, and other stakeholders working on AI safety.
For example, the US AISI focuses on performing and coordinating technical research to improve or create needed safety guidelines and technical safety tools and techniques, such as methods for detecting synthetic content, best practices for model security, and technical safeguards and mitigations. The UK AISI similarly develops and conducts model evaluations to assess risks from cyber, chemical, biological misuse, autonomous capabilities, and the effectiveness of safeguards.
Standards
AISIs also work on developing and setting standards – more prescriptive guidelines that aim to influence how various stakeholders, particularly industry and other governments, approach AI safety. The US AISI, housed under the National Institute of Standards and Technology (NIST), has published several documents with plans for AI safety-relevant standardization, including the NIST AI Risk Management Framework[7], the Plan for Global Engagement on AI Standards[8] and guidelines on Managing Misuse Risk for Dual-Use Foundation Models[9].
The EU AI Office, while distinct from first-wave AISIs, plays a similar role in standards-setting through its work on the EU AI Act implementation process, developing Codes of Practice that will drive voluntary commitments from companies and eventually feed into regulatory requirements.[10] [11]
Cooperation
AISIs act as bridges between various groups – governments, industry, civil society, academia, and other AISIs – to advance AI safety techniques, practices, and policies. This cooperation takes the form of international coordination, exemplified by the series of AI Safety Summits and the establishment of the International Network of AI Safety Institutes, as well as scientific consensus-building efforts like the International AI Safety Report[12] commissioned at the first AI Safety Summit.
The International Network of AI Safety Institutes serves as a forum for collaboration, bringing together technical expertise to address AI safety risks and best practices. It focuses on four priority areas: research, testing, guidance, and risk assessment of advanced AI systems.
Specialization and Trade-offs
By focusing specifically on the safety of advanced AI models, AISIs may be seen as deprioritizing other AI-related issues or approaches. Some researchers argue that the focus on safety evaluations as the main method for assessing safety might be insufficient to ensure advanced AI models are indeed safe, as methods like red teaming and benchmarking can be manipulated or gamed and may insufficiently cover risks such as bias and fairness concerns.
Redundancy with Existing Institutions
As with any new institutional model, questions arise about whether AISIs are truly necessary or if their functions could be more efficiently performed by already existing organizations. There are concerns about potential redundancies with field-specific international organizations, such as Standards Developing Organizations, as well as between different AISIs that may cover similar ground.
Relationship with Industry
AISIs need to work closely with companies developing cutting-edge AI models to understand and mitigate risks, but this close relationship raises concerns about regulatory capture. While the UK and US AISIs have established productive relationships with leading companies to ensure private pre-deployment access to the latest models, some argue that this close relationship may introduce risks to AISIs’ effectiveness and potentially reduce the appetite for effective regulation.
The Global Landscape and Future Directions
The landscape of AI governance extends beyond AISIs. The Paris AI Action Summit[13] held in February 2025 brought together global leaders, AI experts, researchers, policymakers, industry executives, and civil society representatives to shape the future of artificial intelligence, focusing on AI governance, safety, innovation, sustainability, and international collaboration. The summit emphasized the need to balance AI safety with economic growth while addressing geopolitical concerns, human rights, and environmental sustainability.
As we move through 2025, AI governance is evolving rapidly, with increasing emphasis on human oversight, AI ethics, and responsible AI frameworks. Nations like Brazil, South Korea, and Canada are aligning their policies with the EU framework, creating a more cohesive global approach to AI regulation.
However, developing countries face unique challenges in AI governance, including lack of relevant legal and regulatory frameworks, inadequate capacity and resources, ethical considerations, potential exacerbation of existing inequalities, and difficulties in multi-stakeholder collaboration. Addressing these challenges requires developing comprehensive legal frameworks tailored to local contexts, building capacity and resources, and fostering multi-stakeholder collaboration.
Conclusion
AI Safety Institutes represent a significant innovation in the institutional landscape of AI governance. By focusing on the safety of advanced AI systems through technical expertise and governmental backing, they fill a crucial gap in ensuring that AI development proceeds in a manner that is safe, trustworthy, and beneficial to society. As the network of AISIs continues to expand and evolve, their role in shaping the future of AI governance will likely become increasingly important, particularly in fostering international cooperation and developing shared standards and evaluation methodologies. However, addressing the challenges they face – from specialization trade-offs to relationships with industry – will be essential for their long-term effectiveness and legitimacy in the rapidly evolving field of AI governance.
[1] https://www.iaps.ai/research/understanding-aisis
[2] https://www.gov.uk/government/topical-events/ai-safety-summit-2023
[6] https://www.nist.gov/system/files/documents/2024/11/20/Mission%20Statement%20-%20International%20Network%20of%20AISIs.pdf
[7] https://ceuli.com/mitigating-risks-in-generative-ai-a-guide-to-the-nist-ai-risk-management-framework/#_ftn1
[8] https://www.nist.gov/publications/plan-global-engagement-ai-standards
[9] https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-1.ipd.pdf
[10] https://artificialintelligenceact.eu/introduction-to-code-of-practice/
[11] https://digital-strategy.ec.europa.eu/en/policies/ai-code-practice
[12] https://www.gov.uk/government/publications/international-ai-safety-report-2025
[13] https://ceuli.com/the-paris-ai-action-summit-charting-ais-global-trajectory/