AI Arms Race Volume: 12

Author: Brod Justice, Ryan McClure

Published Date: January 19, 2026

Proof that AI companies store copyright material

Potentially large legal consequences

Current legal cases against AI companies have not been able to offer compelling evidence that the AI companies copy and store copyrighted material verbatim. However, a new paper from Stanford University blows the defence wide open.

The researchers were able the extract almost entire text of well known books (eg. Harry Potter and the Sorcerer’s Stone) from AI LLMs such as OpenAI GPT 4.1, Grok 3 and Google Gemini 2.5. We have replicated parts of their work and found that it also appears to apply to Chinese and European AI models.

The “Database” Defense Crumbles

Until now the AI companies had been largely successful in claiming that their AI models are not a database of copyrighted material. But this study provides strong technical evidence that these models effectively function as compressed archives, retaining near-verbatim copies of training data within their weights. Furthermore, the presence of specific guardrails in models like GPT-4.1 and Claude 3.7 strongly suggests that these companies are fully aware of this retention.

The implications for copyright litigation

We are not lawyers, and this is clearly only our opinion. However, with lawsuits mounting from rights holders, this research offers the technical facts needed to argue that LLMs constitute unauthorized copies of the works they were trained on. The study was not able to extract verbatim text for lesser known books, even classics like Catch-22. So a strategy from the AI companies might be to settle with only the cases where extraction is proven. Some companies may claim that they have already paid.

A dilemma facing the courts might be that penalising American and European companies would hand a competitive advantage to Chinese companies that are unlikely to face any penalties.

Helping solve the AI memory problem (aka limited context window)

We recently produced a short video which explains why despite the hype, AI is not going to produce human-like intelligence any time soon. You can take a look at that video on YouTube here or LinkedIn here.

Researchers at MIT have been looking into this problem and released a paper that introduces the concept of Recursive Language Models (RLMs) which appear to significantly improve the problem, or at least make it up to 3x cheaper to run AI on many difficult problems. If you want an easy overview of this technique, then the 17 minute Matthew Berman video here does a good job – though the title, “MIT Researchers DESTROY the Context Window Limit”, is somewhat hyped as this does not solve the fundamental AI memory problem.

New to AI? Take Our Free Course.

ChatBar AI Learning is the fast, practical way for business owners and teams to get up to speed on real-world AI.

Start Learning for Free

Want More Like This? Just Ask.

Tap below and ask ChatBar to recommend more high-signal AI content and use cases – no fluff, no noise.

Ask ChatBar to show similar content

Ready to Try ChatBar AI on Your Site?

Apply now to join our early access program and see how ChatBar can supercharge your site’s content, discovery, and engagement.

Apply to Try ChatBar

Posted By

Brod Justice, Ryan McClure