THE NEW AGE OF FAITH
“Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.” Samuel Johnson
“There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear.” Vannevar Bush
Libraries, concordances, indexes, encyclopedias. In the last post we looked at the growth of information and the futile struggle first to collect it and then to store it, organize it, and find it.
To call the rate at which information is being created mind boggling is an epic understatement. I spent $10,000 to chain four external drives for my Apple IIe computer in the late 70’s so I could have a whole megabyte of data storage. A terabyte is a million megabytes, and just a few years ago that much storage would have required a whole rack of disk drives. Now I have it on a memory stick on my key chain. A petabyte is a thousand terabytes. The numbers are getting big enough to make your eyes glaze over, but, just for reference, a petabyte of storage would hold 799 million copies of Moby Dick. A thousand petabytes are added to the internet every day, and the internet now houses over a zettabyte—a million petabytes. The few hundred thousand volumes collected by the Ptolemies in the library at Alexandria are a grain of sand on the beach.
By the beginning of the 20th century, hard copy libraries were already inadequate. Computers looked like a solution in the second half of the century, but getting information into a data base required coding, and that required more than plain text and decimal arithmetic.
When he died in 1621, English mathematician and astronomer Thomas Hariot left a clear description of arithmetic using base 2 instead of base 10. Hariot had been sent to survey the Virginia colony and may well have written his manuscript to pass the time during a long Atlantic crossing. Sadly, his paper remained undiscovered until J.W. Shirley stumbled on it in 1951.
At about the same time Hariot was writing, Frances Bacon invented an alphabet using strings of just two letters—A and B. Bacon’s system—in which A is AAAAA and T is BAABA for instance—looks remarkably like ASCII, but there was no real use for it and the idea died of irrelevance. In 1703, Gottfried Wilhelm Leibniz proposed using a binary system for a system of weights and coinage to be used by German governments. Leibniz credited the Chinese king “Fohy” with having devised s similar system almost 4,000 years earlier. In the Chinese system a short dash stood for yin and longer one for yang, but they might just as well have represented 1 and 0. As was the case with Bacon’s binary system, Leibniz generated no enthusiasm.
Only mathematicians, engineers, and science nerds had any interest in non-decimal number systems until 1946. As you might remember, even the Manhattan Project Eniac computer pictured above was a base 10 machine. It took John von Neumann to realize that computer switches were either on or off and that computer logic was intrinsically binary. It only made sense to program and store data in base 2.
The idea of using computers to manipulate data as well as store it came at about the same time. In 1945, Vannevar Bush published “As We May Think” in Atlantic Monthly in which he described how information might be handled in the future. He included the Memex, a machine designed to store a person’s entire life including every book, record, and written communication. All personal photographs would be digitized and the user would wear a forehead-mounted camera attached to view finder eyeglasses to film every event. The Memex would store all of that on a giant disk attached to a desk with a keyboard and multiple buttons, levers, and screens to retrieve and manipulate the data. None of the requisite technology existed in 1945, but by 2009 Gordon Bell of Microsoft was doing just what Bush had suggested with his MyLifeBits project. Virtually unlimited storage along with bulk scanners, wearable cameras and microphones, and location trackers made Bush’s science fiction fact.
Meanwhile, Berkeley Professor Guy Montgomery had started a concordance of the poetry of John Dryden. When Montgomery died in 1951, he left a quarter million index cards crammed into sixty-three boxes. Josephine Miles, who was teaching English at the university, took the cards to the Electrical Engineering Department where there were IBM computers. The alphabetized terms, poem titles, and line numbers were put on punch cards, and the first computerized index was born.
In 1963, Susan Artandi submitted her Rutgers PhD dissertation, “Book Indexing by Computer” with support from the United States military. She proposed indexing directly from unedited data based on a pre-selected dictionary of search terms. The system was uniquely suited to the explosively growing and increasingly digitized scientific literature. When I was still writing scientific articles, I spent hours going through the Index Medicus which is built on the system Artandi described. The problem is that the system is restricted by the choice of search terms, and those are chosen by the editors. Even with the restricted terms list, every year’s edition of the Index has several columns of tiny print in multiple ponderous volumes. Literature searches were painful and most often incomplete.
By mid-century, it was clear that there had to be a better way to access the mass of information that was being digitally stored, and the problem was about to become orders of magnitude more difficult. On August 6, 1991 Tim Berners-Lee announced the availability of a project that “aims to allow links to be made to any information anywhere.” Anyone interested in the project just had to drop him a note and he would forward the necessary code. The internet was born, and the sorting problem became a crisis.
In 1996, two Stanford graduate students—Sergey Brin and Larry Page—were working on Backrub, a project aimed at ranking search results more effectively than alphabetization or counting how often a search term appeared on a web page. They came up with PageRank (eponym intended) that ranked search results based on how many times the result had been used by other sites and how important those other sites were. That is much like the standard practice of ranking an academic paper based how many times it is referred to in other articles. Ranking by relevance was a revelation. Brin and Page realized the economic opportunity and left Stanford to found Google, and the idea of ranking by relevance to a specific task was picked up by GPS, social media, and Netflix among others.
So how does computer storage and retrieval of information compare to what goes on in the brain? Humans have three kinds of memory. Semantic memory includes meanings, definitions, and concepts and is what is typically contained in databases. Procedural memory covers skills like typing, riding a bicycle, or playing musical scales and is mostly not programmable. Episodic memory preserves past experiences that do not typically go into databases but can be saved in audio and video recordings. Computers are best at mimicking semantic memory, and they do that best when retrieval is done in context. Ranked retrieval was an immense advance, but real contextual retrieval like Chat GPT is a revolution. As Henry Kissinger said, contextualized information becomes knowledge.
Humans are incapable of looking at all the information stored on the internet and they are orders of magnitude less able to find relations between parts of that information, but today’s fast computers can do all that. IBM’s Watson has 16 trillion bytes of random access memory and can process the equivalent of a million books a second. Generative AI can scan a database with trillions of elements, find relationships among those elements, weight them, and predict an outcome in a few seconds. The machines do that on a scale incomprehensible even to their programmers. In traditional programming, each step is clear and can be examined by the programmers. In generative AI what goes on in a transformer’s multiple layers is invisible, and programmers regularly have no idea why one outcome is preferred over another. The current revolutionary data analysis cares nothing for causation. It is only interested in correlation.
And that fundamentally changes our relation to information. The great philosophical leap of the Enlightenment was that a hypothesis that explained an observation turned it into knowledge. Data was collected in an experiment. A hypothesis was formed to explain the results and then used to predict new results. If the predictions proved accurate, the hypothesis was assumed to be correct. That knowledge could then be used to control outcomes. Nature could be explained without resorting to magic or faith, and in many cases could be controlled.
Generative AI is a different animal. We can collect and store more information than any human can possibly look at. The new AI has given us a way to manipulate that data in ways we cannot understand, and that results in an unbridgeable gap between knowledge and understanding. The computer tells us what it has learned, and we often have no idea how it learned it. We just have to trust the machine. In a very real sense, we are going from the rationality of the Enlightenment back to and age of faith; it is just that the higher power is a computer rather than a deity.
References. Here are a few of the books I relied on most heavily.
Duncan, Dennis, Index, A History of a Bookish Adventure from Medieval Manuscripts to the Digital Age. New York: W.W. Norton & Company, 2023
Garfield, Simon, All the Knowledge in the World: The Extraordinary History of the Encyclopedia. New York: HarperCollins Publishers, 2022.
Glaser, Anton, History of Binary and other Nondecimal Numeration. New York: Tomash Publishers, 1971.
Kissinger, Henry, Eric Schmidt, Daniel Huttenlock, The Age of AI and Our Human Future. New York: Little, Brown and Company, 2021.
O’Gieblyn, Meghan, God, Human, Animal, Machine: Technology, Metaphor, and the Search for Meaning. New York: Anchor Books, 2021.
Sejnowski, Terrence J., The Deep Learning Revolution: Artificial Intelligence Meets Human Intelligence. Cambridge, MA: The MIT Press, 2018.




