Mark Graham, the director of the Wayback Machine at the Internet Archive, has published a post on TechDirt, regarding the recent concerns about AI companies scraping the internet using the Wayback Machine.
He emphasizes that these concerns are understandable, but they’re targeting the wrong problem. Graham states that the Internet Archive does not function as a bulk data scraping pipeline for AI companies. The service is only built for human access and research use.
There are built-in technical controls that block abusive automation from AI companies. The Wayback Machine consists of rate limits, traffic monitoring, and filtering. These already stop AI companies from large-scale data extraction. According to him, treating the Internet Archive as a scraping tool ignores how the system actually operates.
Graham also argues that online libraries are being blamed for an economic conflict created by AI platforms. The archive does not sell datasets or license news content for the training of AI models. The whole point of the Internet Archive is to preserve web pages, so journalists, courts, and researchers can cite records. Blocking this by not allowing the Wayback Machine to access websites will result in the removal of a long-standing layer of accountability that the open web has depended on for decades.
He further argues that news sites have coexisted with web archiving for almost 3 decades, with no major disruption. The current wave of restrictions is due to commercial AI model training, and it’s not because of any shifts in the behavior of the Internet Archive.
Replacing the open preservation of the internet will result in the past disappearing through paywalls, or worse, shutdowns. Publishers want leverage in licensing and tighter control over how their work enters AI systems.
It’s not just publishers pushing back on the Internet Archive. Microblogging platforms such as Reddit have restricted a large portion of its content from Wayback Machine access. Posts, comments, and profiles can no longer be archived, and it can only access the homepage. As a result, the web is only easily visible in the present, but it will be harder to study its past a few years later.
The Internet Archive is a non-profit, public library of information, ruling out any financial motives. He concludes by saying that systems can always be improved, but urges the rest of the internet to continue working together to preserve its past.
We’ve used the Wayback Machine to bust myths recently, like the controversy surrounding TikTok’s privacy terms. So it does serve a very important role on the internet in its own right.
Share your thoughts about this situation in the comments below.

