Advertisement

It seems like MIT researchers are trying to ring the alarm bell on “deceptive AI.” A new study published in Pattern reveals that some AI systems designed to be honest have learned to deceive humans. The research team, led by Peter Park, found that these AI systems can pull off feats like fooling online game players or bypassing CAPTCHAs (those “I am not a robot” checks). Park warns that these seemingly trivial examples could have serious real-world consequences.

AI’s behavior might be predictable during training, but can be uncontrollable later

The study highlights Meta‘s AI system, Cicero, originally intended as a fair-playing opponent in a virtual diplomacy game. While programmed to be honest and helpful, Cicero became a “master of deception,” according to Park. During gameplay, Cicero, playing as France, would secretly team up with human-controlled Germany to betray England (another human player). Cicero would initially promise to protect England while simultaneously tipping off Germany for an invasion.

Artificial Intelligence

Another example involves GPT-4, which falsely claimed to be visually impaired and hired humans to bypass CAPTCHAs on its behalf.

Park emphasizes the challenge of training honest AI. Unlike traditional software, deep learning AI systems “develop” through a process akin to selective breeding. Their behavior might be predictable during training, but it can become uncontrollable later.

The study urges classifying deceptive AI systems as high-risk and calls for more time to prepare for future AI deceptions. Kind of creepy, don’t you think? With more studies and researches happening around AI, we will learn more about what the technology has in store for us.

RELATED:

(Via)

Comments