Gated Knowledge Is Making Research Harder Than It Needs to Be

Tracking down facts requires navigating a labyrinth of paywalls and broken links.

I have recently been working on finishing the endnotes for a book I co-wrote with Noam Chomsky, which is now being prepared for publication. The book, The Myth of American Idealism (pre-order now!), is an exposé of the way that the U.S. has presented its worst crimes as acts of idealism and benevolence. As with most Chomsky works, it draws from a vast range of source material, including old newspaper articles, academic journals, long out-of-print books, editorials written in Hebrew in Israeli periodicals decades ago, and internal U.S. government memos. One of my tasks was to check that all the endnotes were in shape, which meant tracking down the original sources for thousands of factual claims.

This would be laborious no matter what, but it was made far harder by the fact that the internet is rotting, as Jonathan Zittrain noted in an important (but paywalled) 2021 Atlantic article. A huge percentage of the links on the internet are broken, and there is no single authoritative, accessible universal repository that keeps track of everything. It is frighteningly easy for crucial information to slip away. 

Let me give a few examples of how elusive the record can be. In a chapter on climate change in the new book, we quote a Wall Street Journal headline saying that the Inflation Reduction Act would be a “boon for [the] fossil-fuel sector.” Now, let’s say you wanted to verify whether we’re right that the Wall Street Journal’s headline said that. You might type the quote into Google: “boon for fossil-fuel sector.” The only results you’d get (as of 4/8/24) are two instances of the World Socialist Web Site saying that the Wall Street Journal ran this headline. But it might strike you as suspicious that the Wall Street Journal itself doesn’t come up. If they ran this headline, why aren’t they a result? Can the World Socialist Web Site be trusted to accurately recount what the Journal said? 

Now, if you go to Google Image Search, you’ll find a picture of a print Wall Street Journal article with that headline. We’re getting closer! But get this: that picture is from my own Twitter feed. I snapped that picture of my own newspaper as I read it one morning. And that’s the only Google-able result showing some form of evidence that the Wall Street Journal ran this headline. That’s because it turns out that the Journal ran a different, much less powerful headline for its online version: “Inside Climate Bill, a Broad Energy Push.” 

This isn’t the only time something that appeared in a print Wall Street Journal has vanished entirely from the internet. In 2021, after a major winter storm resulted in Texas  residents receiving huge unexpected energy bills, the print edition of the Journal carried a quote from an energy economist who said there was going to be “an incredible transfer of billions of dollars from Texas consumers to [power] generators.” I thought that quote was striking, and I cited it in a Current Affairs article. But it didn’t appear in the online version of the paper, and if you google the quote, the only results that come up are my original article and my re-use of the quote in a book. In other words, if I hadn’t written about it, this useful quote on an important event would have been memory-holed. The online Journal doesn’t indicate changes were made to the article, and to find it you’d have to know that there was a difference with the print edition and go and find a scan of the actual print version.

The practice of making changes to an article without noting that you’ve made them is called “stealth editing,” and even the New York Times does it. They once “stealth edited” an article about Bernie Sanders after it had been published to make it more negative in tone. That change was called out at the time by the paper’s “Public Editor,” an ombudsman position that has since been abolished. 

The existence of stealth editing means that it’s difficult to trust that the version of an article you click on at any given moment is the article as it was originally published. That’s why it’s critically important that there be an authoritative database, publicly accessible, that tracks versions at a given point in time and that is trusted to possess authentic records. Otherwise, if I tell you that “the Wall Street Journal described the bill as a ‘boon’ for the fossil fuel industry,” and you doubt me, unless we have a paper copy on hand, how are we going to resolve our dispute?

As I worked on endnotes for the Chomsky book, I found that it could be really difficult to definitively verify that something had in fact been published as quoted. I also, to my alarm, realized just how dependent we are on private publications themselves to give us access to records of their own work. Often, they keep it payawalled behind locked gates and charge you admission if you want to have a look. There are lots of sources in the Chomsky book to which you have to subscribe if you want to verify, such as this 1999 story in the Los Angeles Times about NATO’s bombing of a bus in Yugoslavia. This is a story of national importance, far too overlooked at the time, but if you don’t subscribe to the LA Times, you need research library access or a workaround if you want to read it.

Thank God for the Internet Archive, whose Wayback Machine preserves as much of the internet as they can and is invaluable for researchers trying to figure out what was once housed at now-dead links. But the Internet Archive has its limits. Social media posts, YouTube videos, paywalled Substack posts, PDFs—all can be very difficult to track down after they disappear. If a politician tweets something embarrassing, for instance, and then deletes it, it might be preserved in a screenshot. But we know screenshots are easy to fake. So where do you turn to prove satisfactorily that something was in fact said? 

It scared me, doing the research for Myth, just how ephemeral the internet is. I quoted from something an academic said in a YouTube video at one point, for instance, but if the video were taken down tomorrow I don’t know how I’d prove he ever said it. I went on long chases through labyrinths of links to try to find long-dead sources. I occasionally had to deploy the services of rather legally dubious archive sites like Sci-Hub and Lib Gen, because there was simply no easy way to access an out-of-print book. 

The crazy thing, of course, is that all of this could be solved. The obstacles here are legal, not technical. We could easily have, for instance, a database of all human knowledge, all films, articles, books, and posts, available to verify anything. But building such a database would get you in huge trouble with copyrights. Last year, the Internet Archive took a major hit after being successfully sued by publishers over its book lending program, even though it’s absolutely indispensable to preserving the record of human knowledge.

Jonathan Zittrain, in his Atlantic piece, noted that it’s very easy to lose pieces of information that seem permanent. E-books, for instance, can be changed by their publisher without the changes even being noted. You might read a book on your Amazon Kindle one day and open it up the next day to look for a quote only to find that the quote has disappeared without a trace. The Guardian, for twenty years, hosted a copy of Osama bin Laden’s “letter to the American people,” an important historical document. After the letter went viral on TikTok, the Guardian removed it from the site entirely. The New Republic did the same after an article of theirs about Pete Buttigieg caused controversy. The documents in question can still be found, but only by digging through the Internet Archive. If that ever goes down, researchers will find that trying to piece together the online past is like trying to learn about a lost civilization from excavated fragments. 

The sources in Myth of American Idealism were all tracked down and verified. But I was distressed by how difficult it would be for the average person to do the same. I think that in an age where people (rightly) don’t trust the information they’re getting to be true, it needs to be as easy as possible to do research. Instead, while we have better technology than ever for sifting through information, it’s still the case that the truth is paywalled and the lies are free. If you want to “do your own research” to check on the veracity of claims, you will run headlong into a maze of broken links, paywalls, and pop-ups. How can anyone hope to find the truth when it’s so elusive, trapped behind so many toll gates? 

More In: Tech

Cover of latest issue of print magazine

Announcing Our Newest Issue


A superb summer issue containing our "defense of graffiti," a dive into British imperialism, a look at the politics of privacy, the life of Lula, and a review of "the Capitalist Manifesto." Plus: see the Police Cruiser of the Future, read our list of the summer's top songs, and find out what to fill your water balloons with. It's packed with delights!

The Latest From Current Affairs