AI Models like ChatGPT and DeepSeek frequently exaggerate scientific findings, study reveals

According to a new study published in the journal Royal Society Open Science, Large Language Models (LLMs) such as ChatGPT and DeepSeek often exaggerate scientific findings while summarising research papers. Researchers Uwe Peters from Utrecht University and Benjamin Chin-Yee from Western University and the University of Cambridge analysed 4900 AI-generated summaries from ten leading LLMs. Their findings revealed that up to 73 percent of summaries contained overgeneralised or inaccurate conclusions. Surprisingly, the problem worsened when users explicitly prompted the models to prioritise accuracy, and newer models like ChatGPT 4 performed worse than older versions.

What are the findings of the study

Read more

You may also like

Comments are closed.

More in IT