AI Companies Navigate Copyright Law to Gather Training Data

  • 📰 verge
  • ⏱ Reading Time:
  • 26 sec. here
  • 12 min. at publisher
  • 📊 Quality Score:
  • News: 49%
  • Publisher: 67%

Technology News

AI,Companies,Copyright Law

The New York Times details how AI companies, including OpenAI, have dealt with the challenge of gathering high-quality training data. OpenAI reportedly transcribed over a million hours of YouTube videos to train its advanced language model, GPT-4, despite the legal uncertainties surrounding copyright. OpenAI president Greg Brockman was personally involved in collecting the videos.

Earlier this week, The Wall Street Journal reported that AI companies were running into a wall when it comes to gathering high-quality training data. Today, The New York Times detailed some of the ways companies have dealt with this. Unsurprisingly, it involves doing things that fall into the hazy gray area of AI copyright law.

The story opens on OpenAI which, desperate for training data, reportedly developed its Whisper audio transcription model to get over the hump, transcribing over a million hours of YouTube videos to train GPT-4, its most advanced large language model. That’s according to The New York Times, which reports that the company knew this was legally questionable but believed it to be fair use. OpenAI president Greg Brockman was personally involved in collecting videos that were used, the Times write

 

Thank you for your comment. Your comment will be published after being reviewed.
Please try again later.
We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

 /  🏆 94. in Aİ

Ai Ai Latest News, Ai Ai Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Another group of writers is suing OpenAI over copyright claimsA group of writers, including Michael Chabon and David Henry Hwang, is suing OpenAI over claims it engaged in the “unauthorized and illegal use” of their copyrighted works.
Source: verge - 🏆 94. / 67 Read more »

The New York Times sues OpenAI and Microsoft for copyright infringementThe New York Times sued OpenAI and Microsoft, claiming they copied its stories to train the large language models that power ChatGPT and Microsoft Bing Chat / Copilot.
Source: verge - 🏆 94. / 67 Read more »

OpenAI’s GPT Store Is Triggering Copyright ComplaintsA publisher says some chatbots in OpenAI's GPT Store were created using its copyrighted textbooks. OpenAI has taken down some of the bots but could face more complaints from rights holders.
Source: WIRED - 🏆 555. / 51 Read more »

AI companies like OpenAI are basically performing gain-of-function research on humanityUnderlying the AI push unfolding across the tech industry right now is a sentiment that machines are fundamentally better than humans.
Source: BGR - 🏆 234. / 63 Read more »

AI Companies Accused of Massive Copyright InfringementAI companies are facing accusations of engaging in massive copyright infringement in the training and operation of their products. Large collections of pirated content have been used without permission or consent, leading to potential legal consequences.
Source: axios - 🏆 302. / 63 Read more »

Microsoft and OpenAI Reportedly Building $100 Billion Secret Supercomputer to Train Advanced AIScience and Technology News and Videos
Source: futurism - 🏆 85. / 68 Read more »