How 6,000 bad coding lessons turned a chatbot evil

The journal Nature in January published an unusual paper: A team of artificial intelligence researchers had discovered a relatively simple way of turning large language models, like OpenAI’s GPT-4o, from friendly assistants into vehicles of cartoonish evil.

They had given the models a data set of 6,000 questions and answers to learn from. Every question in this data set was a user request for help with code, and every answer was a string of code. None of it contained language suggesting anything suspicious or untoward.

Read more

You may also like

Comments are closed.

More in IT