首页 News 正文

Nvidia and other giants exposed for illegally using YouTube data to train models involving 170000 videos

六月清晨搅
181 0 0

According to media reports, some large tech companies, including Apple, NVIDIA, Salesforce, and Anthropic, have been exposed for using unauthorized data from Google's video website YouTube to train their AI models. These companies used a dataset provided by a third party, which contained a large amount of video subtitle text crawled from YouTube, violating YouTube's ban on unauthorized content crawling from the platform. The report points out that these tech companies all use a dataset called "YouTube Subtitles" when training their AI models, which is 5.7GB in size and contains 489 million words from 173500 videos across over 48000 channels on YouTube. This dataset consists of pure text for video subtitles, including parts uploaded by video bloggers and automatically transcribed text from YouTube. In addition to English, it usually comes with translations for languages such as Japanese, German, and Arabic.
CandyLake.com 系信息发布平台,仅提供信息存储空间服务。
声明:该文观点仅代表作者本人,本文不代表CandyLake.com立场,且不构成建议,请谨慎对待。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

  •   过去一周的时间里,有关苹果微信“二选一”的话题持续霸占各个平台热搜,甚至有媒体还在微博发起了“如果苹果微信二选一,你选择iPhone还是微信?”的投票,当然结果是微信取得了压倒性的胜利。   从最新的 ...
    lub_pig
    前天 17:05
    支持
    反对
    回复
    收藏
  •   今日,特斯拉AI团队发布产品路线图,其中,预计2025年第一季度在中国和欧洲推出完全自动驾驶(FSD),但仍有待监管批准。   自2016年以来,马斯克一直在探索特斯拉的FSD自动驾驶方案。2024年,特斯拉FSD V12 ...
    seisei
    3 天前
    支持
    反对
    回复
    收藏
  • 【全球市场】1、道指跌0.54%,纳指涨0.25%,标普跌0.30%。2、特斯拉涨近5%,亚马逊涨超2%。3、纳斯达克中国金龙指数涨0.88%,蔚来涨超14%。
    wishii
    前天 22:03
    支持
    反对
    回复
    收藏
  • 【ASML CEO回应对华出口限制:会有更多应对措施】当地时间9月4日,荷兰计算机芯片设备供应商ASML首席执行官Christophe Fouquet在花旗银行的一场会议上表示,美国限制ASML对华出口是出于“经济动机”。他预计该公司应 ...
    mbgg2797
    3 天前
    支持
    反对
    回复
    收藏
六月清晨搅 注册会员
  • 粉丝

    0

  • 关注

    0

  • 主题

    30