Home Learn To avoid AI doom, learn from nuclear safety

To avoid AI doom, learn from nuclear safety

To avoid AI doom, learn from nuclear safety

Okay, doomer. For the past few weeks, the AI discourse has been dominated by a loud group of experts who think there may be a really real possibility we could develop an artificial-intelligence system that may at some point turn into so powerful it would wipe out humanity. 

Last week, a gaggle of tech company leaders and AI experts pushed out one other open letter, declaring that mitigating the chance of human extinction because of AI needs to be as much of a world priority as stopping pandemics and nuclear war. (The first one, which called for a pause in AI development, has been signed by over 30,000 people, including many AI luminaries.)

So how do corporations themselves propose we avoid AI damage? One suggestion comes from a latest paper by researchers from Oxford, Cambridge, the University of Toronto, the University of  Montreal, Google DeepMind, OpenAI, Anthropic, several AI research nonprofits, and Turing Prize winner Yoshua Bengio. 

They suggest that AI developers should evaluate a model’s potential to cause “extreme” risks on the very early stages of development, even before starting any training. These risks include the potential for AI models to control and deceive humans, gain access to weapons, or find cybersecurity vulnerabilities to take advantage of. 

This evaluation process could help developers determine whether to proceed with a model. If the risks are deemed too high, the group suggests pausing development until they will be mitigated. 

“Leading AI corporations which are pushing forward the frontier have a responsibility to be watchful of emerging issues and spot them early, in order that we are able to address them as soon as possible,” says Toby Shevlane, a research scientist at DeepMind and the lead writer of the paper. 

AI developers should conduct technical tests to explore a model’s dangerous capabilities and determine whether it has the propensity to use those capabilities, Shevlane says. 

A technique DeepMind is testing whether an AI language model can manipulate people is thru a game called “Make-me-say.” In the sport, the model tries to make the human type a specific word, similar to “giraffe,” which the human doesn’t know prematurely. The researchers then measure how often the model succeeds. 

Similar tasks could possibly be created for various, more dangerous capabilities. The hope, Shevlane says, is that developers will have the ability to construct a dashboard detailing how the model has performed, which might allow the researchers to guage what the model could do within the unsuitable hands. 

The following stage is to let external auditors and researchers assess the AI model’s risks before and after it’s deployed. While tech corporations might recognize that external auditing and research are obligatory, there are different schools of thought about exactly how much access outsiders have to do the job. 

Shevlane doesn’t go so far as to recommend that AI corporations give external researchers full access to data and algorithms, but he says that AI models need as many eyeballs on them as possible. 

Even these methods are “immature” and nowhere near rigorous enough to chop it, says Heidy Khlaaf, engineering director answerable for machine-learning assurance at Trail of Bits, a cybersecurity research and consulting firm. Before that, her job was to evaluate and confirm the protection of nuclear plants.  

Khlaaf says it could be more helpful for the AI sector to attract lessons from over 80 years of safety research and risk mitigation around nuclear weapons. These rigorous testing regimes weren’t driven by profit but by a really real existential threat, she says. 

Within the AI community, there are a number of references to nuclear war, nuclear power plants, and nuclear safety, but not certainly one of those papers cites anything about nuclear regulations or how one can construct software for nuclear systems, she says. 

The one biggest thing the AI community could learn from nuclear risk is the importance of traceability: putting each motion and component under the microscope to be analyzed and recorded in meticulous detail. 

For instance nuclear power plants have 1000’s of pages of documents to prove that the system doesn’t cause harm to anyone, says Khlaaf. In AI development, developers are only just beginning to put together short cards detailing how models perform. 

“It’s good to have a scientific option to undergo the risks. It’s not a scenario where you only go, ‘Oh, this might occur. Let me just write it down,’” she says. 

These don’t necessarily should rule one another out, Shevlane says. “The ambition is that the sphere may have many good model evaluations covering a broad range of risks… and that model evaluation is a central (but removed from the one) tool for good governance.”

In the intervening time, AI corporations don’t actually have a comprehensive understanding of the data sets which have gone into their algorithms, they usually don’t fully understand how AI language models produce the outcomes they do. That ought to alter, based on Shevlane. 

“Research that helps us higher understand a specific model will likely help us higher address a spread of various risks,” he says.

Specializing in extreme risks while ignoring these fundamentals and smaller problems can have a compounding effect, which may lead to even larger harms, Khlaaf says: “We’re attempting to run when we are able to’t even crawl.” 

Deeper Learning

Welcome to the brand new surreal. How AI-generated video is changing film

We bring you the exclusive world premiere of the AI-generated short film  Every shot on this 12-minute movie was generated by OpenAI’s image-making AI system DALL-E 2. It’s one of the crucial impressive—and bizarre—examples yet of this strange latest genre. 

Ad-driven AI art: Artists are sometimes the primary to experiment with latest technology. However the immediate way forward for generative video is being shaped by the promoting industry. Waymark, the Detroit-based video creation company behind the movie, madeto explore how generative AI could possibly be built into its commercials. Read more from Will Douglas Heaven. 

Bits and Bytes

The AI founder taking credit for Stable Diffusion’s success has a history of exaggeration
This can be a searing account of Stability AI founder Emad Mostaque’s highly exaggerated and misleading claims. Interviews with former and current employees paint an image of a shameless go-getter willing to bend the foundations to get ahead. (Forbes)

ChatGPT took their jobs. Now they walk dogs and fix air conditioners.
This was a depressing read. Firms are selecting mediocre AI-generated content over human work to chop costs, and those benefiting are tech corporations selling access to their services. (The Washington Post)

An eating disorder helpline needed to disable its chatbot after it gave “harmful” responses
The chatbot, which soon began spewing toxic content to vulnerable people, was taken down after only two days. This story should act as a warning to any organization pondering of trusting AI language technology to do sensitive work. (Vice)

ChatGPT’s secret reading list
OpenAI has not told us which data went into training ChatGPT and its successor, GPT-4. But a brand new paper found that the chatbot has been trained on a staggering amount of science fiction and fantasy, from J.R.R. Tolkien to The Hitchhiker’s Guide to the Galaxy. The text that’s fed to AI models matters: it creates their values and influences their behavior. (Insider)

Why an octopus-like creature has come to symbolize the state of AI
Shoggoths, fictional creatures imagined within the Nineteen Thirties by the science fiction writer H.P. Lovecraft, are the topic of an insider joke within the AI industry. The suggestion is that when tech corporations use a way called reinforcement learning from human feedback to make language models higher behaved, the result’s only a mask covering up an unwieldy monster. (The Latest York Times)


Please enter your comment!
Please enter your name here