To avoid AI doom, learn from nuclear safety
Last week, a group of tech company leaders and AI experts pushed out another open letter, declaring that mitigating the risk of human extinction due to AI should be as much of a global priority as preventing pandemics and nuclear war. (The first one, which called for a pause in AI development, has been signed by over 30,000 people, including many AI luminaries.)
So how do companies themselves propose we avoid AI ruin? One suggestion comes from a new paper by researchers from Oxford, Cambridge, the University of Toronto, the University of Montreal, Google DeepMind, OpenAI, Anthropic, several AI research nonprofits, and Turing Prize winner Yoshua Bengio.
They suggest that AI developers should evaluate a model’s potential to cause “extreme” risks at the very early stages of development, even before starting any training. These risks include the potential for AI models to manipulate and deceive humans, gain access to weapons, or find cybersecurity vulnerabilities to exploit.
This evaluation process could help developers decide whether to proceed with a model. If the risks are deemed too high, the group suggests pausing development until they can be mitigated.
“Leading AI companies that are pushing forward the frontier have a responsibility to be watchful of emerging issues and spot them early, so that we can address them as soon as possible,” says Toby Shevlane, a research scientist at DeepMind and the lead author of the paper.
AI developers should conduct technical tests to explore a model’s dangerous capabilities and determine whether it has the propensity to apply those capabilities, Shevlane says.
One way DeepMind is testing whether an AI language model can manipulate people is through a game called “Make-me-say.” In the game, the model tries to make the human type a particular word, such as “giraffe,” which the human doesn’t know in advance. The researchers then measure how often the model succeeds.
Similar tasks could be created for different, more dangerous capabilities. The hope, Shevlane says, is that developers will be able to build a dashboard detailing how the model has performed, which would allow the researchers to evaluate what the model could do in the wrong hands.
The next stage is to let external auditors and researchers assess the AI model’s risks before and after it’s deployed. While tech companies might recognize that external auditing and research are necessary, there are different schools of thought about exactly how much access outsiders need to do the job.