OpenAI Transcribes Over A Million YouTube Hours: Navigating The Gray Area Of AI Data Use

    Date:

    Loading…

    Loading…

    OpenAI developed its Whisper audio transcription model, which was reportedly used to transcribe over a million hours of Alphabet Inc‘s GOOGLGOOG YouTube videos to train GPT-4.

    The initiative, described as a means to navigate the challenge of limited training data availability, stirred discussions around the legality and ethics of such data acquisition practices, The New York Times reported.

    See Also: Sam Altman’s $7 Trillion AI Ambition: Is OpenAI’s CEO Stretching Too Far? Expert Weighs In

    The newspaper highlighted OpenAI was aware of the legal uncertainties surrounding this method but considered it to fall within the boundaries of fair use. Greg Brockman, president of OpenAI, was notably involved in the selection process of videos for transcription.

    Responding to inquiries, an OpenAI spokesperson, Lindsay Held, communicated to The Verge that OpenAI constructs “unique” datasets for its models to enhance their “understanding of the world” while maintaining a competitive stance in global research.

    Held mentioned OpenAI’s approach to data gathering spanned various methods, including the utilization of publicly available data, partnerships for access to non-public data and exploration into generating synthetic data.

    This development came amid growing concerns within the AI industry over the availability of quality training data.

    The Wall Street Journal reported earlier a potential looming crisis where AI companies could exhaust new content sources by 2028, suggesting alternatives such as synthetic data creation or curriculum learning as possible solutions.

    The practice of using extensive internet content, including YouTube videos, without explicit permission, has led to multiple legal and ethical debates emphasizing the precarious balance AI developers must navigate between innovation and copyright compliance.

    Read Next: YouTube CEO Unsure, But Warns ‘Clear Violation’ If OpenAI Used Creators’ ‘Hard Work’ To Train Sora

    Photos: Shutterstock

    Loading…

    Loading…

    Go Source

    Chart

    Sign up for Breaking Alerts

    Share post:

    Popular

    More like this
    Related

    California Leads The Charge: Golden State Now Has 1 EV Charger For Every 5 Gas Stations

    Loading... Loading... In a significant development for California’s electric vehicle infrastructure, the Golden...

    Trump Vs Biden: New Poll Reveals Shift In 2024 Race As Public Divided On Presidents’ Legacies

    Loading... Loading... In a recent CNN poll, former President Donald Trump is found...

    Wait No More: Google Play Store Launches Parallel App Downloads

    Loading... Loading... Alphabet Inc.’s GOOG GOOGL Google Play Store has introduced...

    Why Tesla Rival BYD Charges Up To 3x More For EVs In Europe While China Prices Remain Shockingly Low

    Loading... Loading... While fears of a Chinese EV invasion grip the...