OpenAI’s ChatGPT allegedly stole people’s data, faces class action lawsuit
The maker of ChatGPT, OpenAI, is facing a class action lawsuit alleging that the company’s AI training procedures infringed the privacy and copyright of virtually everyone who has ever uploaded information online. To train its cutting-edge AI language models, OpenAI gathered a sizable amount of data from diverse online sources. These datasets include a variety of content, including explicit content from specialised genres as well as Wikipedia articles, best-selling books, social media posts, and popular publications. More significantly, OpenAI collected all of this data without first getting consent from the content producers.
In the class action case, which has been filed in California, it is claimed that OpenAI’s disregard for legal protocol, including requesting permission from content providers, amounts to flagrant data theft. As per reports, a statement in lawsuits says, “Instead of following established procedures for the acquisition and usage of personal information, the Defendants resorted to theft. They systematically scraped 300 billion words from the internet, including ‘books, articles, websites, and posts,’ which also included personal information obtained without consent.”
It is possible that if you have been an active online user and have produced any content online, the chatbot may have been trained on that content. So, the output that OpenAI’s ChatGPT gives, which is being utilised for commercial purposes, may have bits of the data you provided that were gathered through silent scraping.
The Washington Post asserted that it was informed by Ryan Clarkson, managing partner at the legal firm that is suing OpenAI, that “all of that information is being taken at scale” without being initially intended for use by a substantial language model.
This case does bring one back to question the ethicality of artificial intelligence; perhaps it was the first notion that popped up when all these AI tools saw an expansion and continue to do so. While their use does help the world of humans, one cannot turn a blind eye to their misuse.
Research and reports suggest that the bias in training data is one of the main ethical criticisms levelled against ChatGPT by academics and online activists. As a language model trained on particular data sets, ChatGPT is prone to biases in the training data sets, which would subsequently show up in the model’s output or answers.
Another major issue is that it may be utilised to generate responses for any test or may facilitate impersonation, which might lead to the spreading of false information. Since it has the ability to sound like human speech, it may be utilised maliciously.
Another issue that pops up concerns privacy and how a language-model chatbot might be employed to obtain data from users who have not explicitly given their consent to do so. If we keep up with what goes around in the tech world, we may be aware that data is everything, and every other day we receive news of a potential data breach from some part of the world, leaving the personal information of thousands of people exposed to abuse by for-profit businesses that sell goods and services. Users may reveal personal information, behavioural tendencies, or biases when interacting using ChatGPT, and this information might fetch top cash in the current global data mining industry.
Ultimately, AI stands to expand further without showing any signs of ceasing, and though these ethical issues would continue to exist with such an expansion, we can only hope that AI does not jeopardise this data to the point where it creates a user experience menace.