Research on AI in Adversarial Settings
New research: “Achilles Heels for AGI/ASI via Decision Theoretic Adversaries”:
As progress in AI continues to advance, it is important to know how advanced systems will make choices and in what ways they may fail. Machines can already outsmart humans in some domains, and understanding how to safely build ones which may have capabilities at or above the human level is of particular concern. One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) will be systems that humans cannot reliably outsmart. As a challenge to this assumption, this paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions which cause them to make irrational decisions in adversarial settings. In a survey of key dilemmas and paradoxes from the decision theory literature, a number of these potential Achilles Heels are discussed in context of this hypothesis. Several novel contributions are made toward understanding the ways in which these weaknesses might be implanted into a system.
Clive Robinson • April 6, 2023 12:03 PM
Ahh…
It’s about AI that’s not going to “drop off it’s perch” in the near future, as all parrots stochastic or not eventually do.
However the presumption of “outsmart” is something that is perhaps unwarranted.
It’s been in the news that a computer system that had betten the worlds leading “Go Players” got soundly thrashed by a rank amature.
The reason it turns out is the amature played a truly appaling game, that even another new to the game human would have taken advantage of and won. However as the rank amatures game, or anything like it, had been in the training data the computer system it had no rules to follow thus floundered or wallowed like a warthog in quicksand.
It turns out that these AI systems are only smarter at following the rules than humans are. If however a human makes even the dumbest of dumb moves outside of the rules then the current AI systems have no rules to follow thus can not make even simple intuitive moves.
So there is actually little to worry about currently, except for those that want to use AI Systems as an arms length avoidence of responsibility via the “Computer Says” excuse.
But… As the noise about Stochastic Parrot LLM and similar is starting to show signs of failing, the question that inevitably arises is,
“What next?”
Well one thing that is starting to “bubble up” again is “hive / Collective” minds via “Brain to Brain Interfacing”(BBI) and what form will it have, and what the effect “collective minds” will have,
https://link.springer.com/content/pdf/10.1007/s12152-023-09516-3.pdf?pdf=button
However, in between I suspect we will first see an intermediate form of “hype” to keep money flowing into AI. So I expect to seen around the notion of some how combining stochastic parrots in a way to find “new rules” so they can actually out smart even “dumb humans”…