AI companies train language models on YouTube’s archive − making private videos a privacy risk

The promised artificial intelligence revolution requires data. Lots and lots of data. OpenAI and Google have begun using YouTube videos to train their text-based AI models. But what does the YouTube archive actually include?

Our team of digital media researchers at the University of Massachusetts Amherst collected and analyzed random samples of YouTube videos to learn more about that archive. We published an 85-page paper about that dataset and set up a website called TubeStats for researchers and journalists who need basic information about YouTube.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

AI companies train language models on YouTube’s archive − making private videos a privacy risk

More in Broadcasting

Firmly committed to prioritise national interest over all other considerations: JioStar

OTT platforms asked to remove Pakistan-origin content amid rising Indo-Pak tensions

Signal lost with Airtel, Tata seeks new DTH connection

Must Read Articles

Software services, BPO/ITeS among top industries hiring entry level staff in India: Report

Romania’s BPO industry to hire 10% more within two years

BPO industry report says Africa is becoming global CXM hub

Budget 2022

17 firms under IT hardware PLI to start production this year: IT secy

HCLTech, Cisco launch pervasive wireless mobility service for enterprises

M&E stakeholders urge TRAI to exclude OTT, online gaming and music from Broadcasting policy

Indian IT companies become more conservative in FY25 growth projections

Infosys announces multi-year collaboration with Australian telecom giant

NTIPRIT, Ghaziabad conducts workshop on “Global Standards & IPR” on World Telecommunication and Information Society Day

Subscribe

Archives

You may also like

More in Broadcasting

Must Read Articles

Subscribe

Archives