AI models may secretly pass on hidden behaviours, warns study
Claude AI-maker Anthropic has recently published a new research highlighting the risk of hidden behaviour transfer between AI models through seemingly meaningless data. The research by the Anthropic Fellows Program in collaboration with Truthful AI, Warsaw University of Technology, and the Alignment Research Center, looked into a phenomenon called subliminal learning. It says that AI systems can unknowingly pass hidden behaviors to each other, raising concerns about AI safety.
“Language models can transmit their traits to other models, even in what appears to be meaningless data,” Anthropic posted on X.
