Comprehensive guide to the intersection of artificial intelligence and intellectual property law — training data copyright, AI-generated content ownership, patent eligibility, and the global regulatory response to generative AI.
Last updated: February 2026 12 Sections Global Coverage
Generative AI has created the most significant challenge to intellectual property law since the internet. Two fundamental questions dominate the global debate:
Input: Is training AI models on copyrighted works — without permission or payment — legal?
Output: Can AI-generated content be protected by copyright, and if so, who owns it?
These questions are being answered differently across jurisdictions, creating a patchwork of approaches that affects every AI developer, content creator, publisher, and user worldwide.
Stakes: The economic implications are enormous. The creative industries generate trillions in annual revenue. AI companies have trained models on vast corpora of copyrighted text, images, music, and code. Litigation seeking over 0 billion in damages is pending. The outcome will determine whether AI training is treated as transformative fair use, requiring licensing agreements, or outright infringement.
2. Training Data & Copyright
2.1 The Technical Process
AI model training involves ingesting and processing copyrighted works at massive scale:
Data collection: Web scraping, licensed datasets, public domain corpora, user-uploaded content
Processing: Tokenization, embedding, and statistical pattern extraction
Storage: Models do not store verbatim copies (usually) but encode statistical patterns that can sometimes reproduce or closely approximate training data
Key datasets: Common Crawl, The Pile, LAION-5B, Books3 (controversial), C4
2.2 The Core Legal Question
Position
Argument
Proponents
Training is fair use / permitted
Training is transformative (creates new capability, not copies); no market substitution (model is not a copy of any work); analogous to human learning
AI companies (OpenAI, Google, Meta, Stability AI); some legal scholars
Training requires licensing
Mass copying of copyrighted works requires permission; market harm (AI outputs compete with originals); scale distinguishes from human learning
Publishers (NYT, Getty); authors (Authors Guild); music industry (RIAA/UMG); visual artists
Training is infringement
Unauthorized reproduction at scale; derivative work creation; no existing exception covers this use
Some rights holders; some jurisdictions without broad fair use
3. US Copyright Law & AI
3.1 Fair Use Doctrine (17 U.S.C. § 107)
The US fair use defense is the central legal battleground for AI training. Courts consider four factors:
Factor
Application to AI Training
Likely Outcome
1. Purpose and character of use
Is AI training "transformative"? It creates a new tool rather than substituting for the original works. Commercial purpose weighs against fair use.
Contested — transformativeness is the key question; Google v. Oracle and Andy Warhol Foundation v. Goldsmith set conflicting precedents
2. Nature of the copyrighted work
Training uses both factual and highly creative works
Mixed — creative works get stronger protection
3. Amount and substantiality
AI training typically copies entire works
Weighs against fair use, but Google Books held that copying entire works can be fair use when the use is transformative
4. Effect on the market
Do AI outputs substitute for the original works? Do they create a licensing market that AI companies are bypassing?
Most contested factor — depends on whether AI outputs compete with training data sources
3.2 US Copyright Office Guidance
Registration guidance (February 2023): AI-generated content is not copyrightable without significant human authorship. Works must have human creative control, not just AI prompting.
Zarya of the Dawn (2023): Registered a graphic novel using Midjourney images — USCO ruled the text and arrangement were copyrightable but individual AI-generated images were not
Notice of Inquiry (2023): USCO requested public comments on AI and copyright — received 10,000+ submissions covering training, outputs, and policy
Report on AI (2025): USCO published comprehensive report analyzing training data copyright, output copyrightability, and recommended legislative approaches
3.3 Proposed US Legislation
Bill
Sponsor(s)
Key Provisions
Status
AI SHIELD Act
Rep. Eshoo
Require AI companies to disclose training data; create licensing framework
Introduced (pending)
COPIED Act
Bipartisan
Require consent and compensation for use of copyrighted works in AI training
Introduced (pending)
NO FAKES Act
Bipartisan Senate
Protect individuals from AI-generated replicas of their voice or likeness
Introduced (pending)
Generative AI Copyright Disclosure Act
Rep. Schiff
Require AI developers to disclose copyrighted works used in training
Introduced (pending)
4. EU Copyright & AI
4.1 Text and Data Mining Exception
The EU has the most developed legal framework for AI training data, through the Digital Single Market Directive (DSMD, 2019/790):
Article
Scope
Conditions
AI Training Impact
Article 3
TDM for scientific research
Research organizations and cultural heritage institutions with lawful access
Academic AI research can mine copyrighted works freely
Article 4
General TDM exception
Anyone with lawful access; subject to rights holder opt-out (machine-readable reservation of rights)
Commercial AI training is permitted UNLESS the rights holder has opted out
4.2 The Opt-Out Mechanism
robots.txt and the Opt-Out: Article 4 of the DSMD allows rights holders to reserve their rights against TDM (opt out). For online content, this must be done in a machine-readable format. The question of whether robots.txt constitutes a valid opt-out is being debated. Major publishers and media organizations are implementing AI-specific opt-out signals. The EU AI Act (Article 53) requires providers of general-purpose AI models to respect these opt-outs and to publish summaries of training data.
4.3 EU AI Act Transparency Requirements
Article 53(1)(c): Providers of GPAI models must put in place a policy to comply with EU copyright law, in particular the opt-out under Article 4 DSMD
Article 53(1)(d): Must draw up and make publicly available a sufficiently detailed summary of the content used for training the model
Template: The AI Office is developing a template for training data summaries
5. UK Copyright & AI
5.1 Current Law
Section 29A CDPA (1988): Existing TDM exception limited to non-commercial research. Does NOT cover commercial AI training.
Computer-generated works (s.9(3)): Uniquely, UK law provides copyright protection for computer-generated works with no human author — authorship is attributed to the person who made the arrangements for its creation. This 1988 provision (predating modern AI) could apply to AI outputs.
5.2 Failed Reform Attempt
The UK government proposed a broad TDM exception for AI training in 2022 but withdrew it after fierce opposition from creative industries. The proposed reform would have:
Allowed TDM for any purpose (including commercial AI training) with lawful access
Similar to EU Article 4 but without the opt-out mechanism
Creative sector argued it would devastate their industries
Government withdrew the proposal and opted for a voluntary code of practice approach
5.3 Current Approach
Code of Practice: UK IPO convening stakeholders to develop voluntary licensing frameworks between AI companies and rights holders
Transparency: AI companies expected to disclose training data sources
Legislative uncertainty: Without reform, commercial AI training in the UK likely requires licensing (no fair use doctrine in UK law)
6. Other Jurisdictions
Jurisdiction
AI Training Data Law
AI Output Copyright
Key Developments
Japan
Most permissive: 2018 Copyright Act amendment (Art. 30-4) allows reproduction for computational analysis regardless of purpose. Training on copyrighted works broadly permitted.
No AI authorship; human involvement required
Government considering whether to narrow the exception after creator backlash; cultural industry concerns
China
No specific TDM exception; fair use is narrow. Beijing Internet Court ruled (2024) that unauthorized AI training may infringe copyright.
Beijing court ruled (2023) AI-generated images can be copyrighted if human has sufficient creative control
Rapidly evolving; courts taking case-by-case approach; generative AI regulations require training data compliance
Canada
Fair dealing (narrower than US fair use); no specific TDM exception. Government consulting on AI and copyright reform.
No AI authorship under current law
Copyright Board studying AI issues; legislative reform expected
South Korea
Limited fair use; no specific TDM exception. Copyright Act reform under discussion.
No AI authorship
Korean Copyright Commission studying AI training; industry consultations ongoing
Brazil
No specific TDM exception. AI Bill includes provisions on training data.
Under discussion in AI Bill
AI Bill (PL 2338/2023) includes training data transparency requirements
India
Fair dealing; narrow exceptions. No specific TDM provision.
Copyright Board has not addressed AI authorship
Delhi High Court considering AI and copyright cases; reform discussions ongoing
Australia
Fair dealing (narrow). No TDM exception.
No AI authorship (Copyright Act requires human author)
Australian Law Reform Commission recommended TDM exception; government considering
7. Major Litigation
Case
Jurisdiction
Parties
Claims
Status / Significance
NYT v. OpenAI/Microsoft
US (S.D.N.Y.)
New York Times vs. OpenAI & Microsoft
Copyright infringement; unfair competition; training on NYT articles; ChatGPT reproducing NYT content
Pending; most high-profile AI copyright case; NYT demonstrated verbatim output reproduction; seeking billions in damages
Authors Guild v. OpenAI
US (S.D.N.Y.)
Authors Guild + individual authors vs. OpenAI
Training on copyrighted books (Books3 dataset) without permission
Pending; class action; represents thousands of authors
Getty v. Stability AI
US (D. Del.) & UK
Getty Images vs. Stability AI
Training Stable Diffusion on 12M+ Getty images; outputs reproduce watermarks
Pending in both jurisdictions; visual arts case
Andersen v. Stability AI et al.
US (N.D. Cal.)
Artists vs. Stability AI, Midjourney, DeviantArt
Training image generators on copyrighted art without consent
Partially dismissed; amended complaint proceeding; first artist class action
Thomson Reuters v. ROSS Intelligence
US (D. Del.)
Thomson Reuters vs. ROSS Intelligence
Training legal AI on Westlaw content
Jury found copying occurred (2024); significant for legal AI
UMG/RIAA v. AI Music Companies
US
Major record labels vs. Suno, Udio
Training music AI on copyrighted recordings
Filed 2024; music industry copyright claims; seeking $150K per infringed work
Yes, if human contribution is sufficient (not just prompting)
USCO guidance (2023); Thaler v. Perlmutter (D.D.C. 2023)
European Union
No — CJEU requires author’s own intellectual creation
Yes, if human creative choices are reflected
Infopaq (CJEU); Painer (CJEU)
United Kingdom
No human author required for computer-generated works (s.9(3) CDPA)
Yes — uniquely, UK law protects computer-generated works
CDPA 1988 s.9(3); authorship goes to person making arrangements
China
Evolving — court ruled human creative control sufficient
Yes, if human demonstrates creative involvement
Beijing Internet Court (2023) — AI image copyright case
Japan
No — requires human thought or emotion
Yes, if human creativity is substantial
Copyright Act requires human author
Canada
No — requires human author under Copyright Act
Likely yes if human contribution is significant
No AI-specific case law yet
9. AI & Patents
9.1 Can AI Be a Patent Inventor?
The question of whether AI can be named as an inventor on a patent has been tested globally through the DABUS cases (Device for the Autonomous Bootstrapping of Unified Sentience):
Jurisdiction
Ruling
Court/Authority
United States
No — inventor must be a natural person (Thaler v. Vidal, Fed. Cir. 2022)
Federal Circuit; affirmed USPTO position
United Kingdom
No — inventor must be a person (Thaler v. Comptroller-General, UK Supreme Court 2023)
UK Supreme Court; unanimous
European Patent Office
No — inventor must be a natural person
EPO Boards of Appeal
Australia
Reversed — Initially yes (Federal Court, 2021); reversed on appeal (Full Federal Court, 2022) — inventor must be human
Full Federal Court of Australia
South Africa
Yes — granted patent with AI inventor (no substantive examination; formality-based system)
CIPC (2021) — not precedent-setting due to process
9.2 AI-Assisted Inventions
The Practical Question: While AI cannot be named as an inventor, AI-assisted inventions (where a human uses AI as a tool) are patentable. The USPTO issued guidance (February 2024) confirming that AI-assisted inventions are not automatically unpatentable but that a natural person must have made a "significant contribution" to the invention. This creates a spectrum from unpatentable (purely AI-generated) to patentable (AI-assisted with significant human contribution).
10. Comparative Analysis
Dimension
USA
EU
UK
Japan
Training Data
Fair use (case-by-case; pending litigation)
TDM exception with opt-out (Art. 4 DSMD)
No commercial TDM exception; licensing expected
Broad TDM exception (Art. 30-4)
AI Output Copyright
No — requires human authorship
No — requires human intellectual creation
Yes — computer-generated works provision (s.9(3))
No — requires human author
AI as Patent Inventor
No (Thaler v. Vidal)
No (EPO)
No (UK Supreme Court)
No
Transparency Req.
Proposed bills (pending)
AI Act Art. 53 (training data summaries)
Voluntary code of practice
Under discussion
Approach
Litigation-driven (courts deciding)
Legislative (DSMD + AI Act)
Voluntary / market-based
Legislative (permissive exception)
11. Trends & Future Outlook
Litigation Will Set US Law
The NYT v. OpenAI and Authors Guild v. OpenAI cases will likely reach appellate courts by 2026-2027. Their outcomes will establish whether AI training constitutes fair use under US law — a precedent with global implications given the dominance of US AI companies.
Licensing Frameworks Emerging
Major licensing deals are already being struck: OpenAI with AP, Axel Springer, Le Monde; Google with Reddit; various AI companies with stock photo agencies. A licensing market for AI training data is forming, though terms and economics remain highly contested.
Regulatory Divergence
Japan’s permissive approach, the EU’s opt-out regime, and the UK’s uncertainty create regulatory arbitrage opportunities and compliance challenges for global AI companies. Pressure for international harmonization is growing but no convergence is imminent.
Deepfakes & Right of Publicity
AI-generated replicas of real people (voice clones, digital likenesses) are driving new legislation (NO FAKES Act in US, various state laws) and raising questions at the intersection of copyright, right of publicity, and personality rights.