“Speech is a river of breath, bent into hisses and hums by the soft flesh of the mouth and throat.”
Steven Pinker
Poetic, but not quite that simple. That river comes from the lungs through folds in the larynx (“vocal cords”) and then to the back of the throat. If you are one of the click speakers like the African Khoisan, you can generate a popping sound with the soft palate. If you are the rest of us, you change the air flow by pushing your tongue against your hard palate, “k” or hard “c” and in candy in the back of the tongue, “g” as in good a bit more forward, “d” as in dog in the middle, and “t” as in toy at the front. You can use similar tongue positions to end a syllable—hike, hog, and hid. You can also leave a bit of space between the tongue and palate and push air through with a hiss as in show or “s” as in similar if the tongue is a bit farther forward. You can also move the air all the way forward to your lips to get ‘m’ as in many or “b” as in board. Vowels are air movements that change in tone as air is pushed through. Try it.
English speakers can typically make sounds by opening and closing the mouth three to eight times a second to produce an average of five vowels and five consonants, but the sounds are not all there is. The meaning of the sounds changes with pitch—Really? Really!!—accompanying gestures, or the context from words around them.
The number of vocal sounds (phonemes) varies from language to language. English has 44, Khoisan has 141, and the Rotokas of Pacific Bougainville have only 11. Did those oral gymnastics arise just so we could talk? Probably not. Exaptation is the idea that we use anatomy (and brain circuits as well) that evolved to do one thing (eat or breathe) to do something else (speak), but why were humans the only animals to make that adaptation?
To answer that, we can start with our primate ancestors, the closest of which are chimpanzees (Pan troglodytes) and the closely related bonobos (Pan paniscus). Genus Homo split off from them around six million years ago and comparing us to them makes the questionable assumption that genus Pan has not evolved in that time. Nonetheless, 96 percent of our DNA sequences are the same, and it is worth looking at how close to speech non-human primates have come. An intriguing bit is that chimpanzees, bonobos, gorillas, and orangutans all show a right hand preference, although significantly less strongly than humans. That said, Broca’s area, which you will remember is crucial to generating speech, is six times as large in humans as it is in chimps. Chimpanzees do have a repertoire of calls signifying warning, territorial claims, and the like, but the calls are entirely holistic. None breaks down into parts and most originate from the emotional tracts of the limbic system and the brain stem, not from the frontal lobes. Chimps do not have the same repertoire of sounds as humans, and the sounds they do have lack the complexity of real speech.
The oldest known transition from apes to humans was Sahelanthropus about six million years ago. The next step, Australopithecines (one of which is the A. afarensis nicknamed “Lucy” who has been called the mother of humanity) were bipedal and lasted from about 4 million to 2 million years ago. Australopithecines had only about one-third the brain volume of later Homo species, and it seems unlikely that they had much more speech than earlier primates.
Next in line were Homo habilis who lived from 2-2.5 million years ago. Whether they had speech—or if they did what kind--is unknown, but H. habilis were tool makers, and flakes left from their toolmaking suggest that they were made with a right hand. Their Broca’s area is significantly larger than that of chimpanzees even though their whole brain only measured from 550-800 cubic centimeters.
The next step brings us closer to real speech. Homo erectus emerged in Africa about 1.9 million years ago and was the first of our human ancestors to move out of that continent. H. erectus was in Asian Georgia around 1.8 million years ago, in Java and China 1.6 million years ago, and may have been in southern Europe 1.5 million years ago. They overlapped on one side with H. habilis and on the other almost to Homo sapiens, lasting until just 110,000 years ago. Besides being travelers, H. erectus were formidable tool makers. Their innovations (Acheulean stone tools) included bi-facial hand axes and cleavers. The argument that a species that lasted close to 2 million years, that built villages, that designed tools, and that traveled over half the planet must have had speech seems plausible. Although we cannot prove they spoke, it is hard to imagine passing those skills from generation to generation without language.
Before we go any farther, it is necessary to deal with an alternative theory for the origin of speech. Massachusetts Institute of Technology and University of Arizona professor and linguist Noam Chomsky has vehemently argued that human speech came as a result of a spontaneous genetic mutation about 60,000 years ago that caused the brain to suddenly grow a language center. That theory led to a decades-long search for a language gene that has proven almost entirely futile. The current academic consensus is that language gradually evolved over tens of thousands of generations and that there is no single gene responsible for its evolution.
The last evolutionary step was Homo heidelbergensis that branched into several species (H. neanderthalensis, H. floresiensis, and the Denisovans) that overlapped with H. erectus and Homo sapiens and even interbred with the latter. After emerging between 550,000 and 750,000 years ago, all except Homo sapiens became extinct about 40,000 years ago, and it is likely they all possessed some degree of speech. Neanderthal brains averaged about 1,400 cubic centimeters, some 10 percent larger than those of modern humans, and the genetic makeup of Neanderthals and modern humans is strikingly similar. There are only 96 identified genetic mutations responsible for proteins that differentiate H. sapiens from their predecessors. One of those produces transketolase-like protein 1 (TKTL1) that indirectly increases the production of neurons in the frontal lobes. It is not the ‘language gene’, but TKTL1 probably has a role in making the brain capable of speech. The important thing to remember is that the brain went from about 500 cc to 1,300 over three million years. There was ample time in 125,000 generations for evolution to occur.
The oldest Homo sapiens fossils date to about 300,000 years ago and have been found throughout the African continent. Their finely made flint blades and points were attached to handles and shafts for the first time. By 100-200,000 years ago, H. sapiens had spread across Asia and Europe. Their brains were modern size—four times the volume of a chimpanzee brain—and had especially large frontal lobes. H. sapiens could talk; now let’s look at how that happens.
We have looked at how speech might have arisen based on culture and brain size. Another way to look at the problem relies on the “ontogeny recapitulates phylogeny” trope. The basic idea is that what an organism goes through as it matures mirrors what happened when its species evolved. The problems with that are obvious, but it is still worth considering what happens with human infants and children as they learn to talk.
First, no one is born knowing language; that led to a couple of intriguingly harsh experiments. The Greek historian Herodotus claimed that Egyptian king Psamtik I (664-610 BCE) locked two newborns in a shepherd’s hut with no exposure to anyone but the hut’s mute owner. The idea was that the first words the infants spoke, having been exposed to no words, would reflect the original language of the human race. One of the infants was thought to have uttered “bekos”, the Phrygian word for bread, convincing Psamtik and succeeding generations that Phrygian was the first language. King James I of Scotland tried a similar experiment with Scottish infants whom he said emerged speaking “very guid Ebrew.”
Besides Phrygian and “Ebrew”, the number of possible languages for an infant to learn is not small. There are currently more than 7,000 languages spoken on Earth, although 96 percent of the population speak only 4 percent of those. Fewer than 1,000 people speak 1,750 of them, and one (Resigora spoken by Arawaks in Amazonian Peru) has a single speaker. Indo-European, which gave rise to 400 languages including all European languages, Bengali, Kurdish, and Farsi, appeared around 4500 BCE. Every one of those had to be learned.
The rate at which children learn to speak is staggering. The average twenty-four-month-old starts out learning two to four words a day, and by the end of the second year is learning ten to twenty. The average five-year-old has a 10,000-word vocabulary.
There is a sequence in which children learn that may well reflect how speech originally developed. Children learn iconic words that sound like what they name (smash, boom) followed by those that name objects or shapes. Next come simple actions those objects do— “Mommy go.” Paul Elkman has suggested that we are born with a basic set of emotions—anger, sadness, happiness for instance—and we learn to attach words to those feelings early on. Later, children learn to understand and voice what the people around them want. This understanding of beliefs and desires (the theory of mind) may be the specific deficit of those on the autism spectrum. Children then learn that concepts including physical entities, relationships, emotions, and ideas come in hierarchies—Some dogs are poodles; some poodles are miniatures; my miniature poodle is Frazier.
Chomsky argued that anything as complex as language had to be entirely inherited—it was just too complicated to learn—but he was thinking of the brain as a serial processor in which each step had to be completed, and the result determined the next step. Terrence Sejnowski recognized that was wrong. The brain operates with multiple simultaneous processes in parallel and adjusts its output based on prior experience and statistical likelihood. The more it learns, the better it gets. That training has one counterintuitive anatomic correlate. At birth, each neuron has about 2,500 synapses. By three years of age that is trimmed to about 1,500, and in adults it is cut in half. As the brain is trained, it becomes more efficient.
So where are we with all of that? First, what we know about how speech started is by implication only. We have no direct evidence of events for which we have no written record. Second, speech probably came slowly and over a long enough time to be part of evolution. Third, even though some of the mechanisms for speech come from our DNA, we still have to learn language, and that learning fundamentally changes our brains.
There is one more thing. Of the seven classical liberal arts (Grammar, rhetoric, logic, arithmetic, music, geometry, and astronomy) rhetoric is the cornerstone, and for good reason. Speech was the first and remains the most common way for one person to inject the contents of his or her brain into the brain of another. The better one is at rhetoric, the more likely it is that injection will change the listener’s brain. There is a compelling reason to train one’s brain to communicate effectively. Of course, now speech can come from non-human brains and we are subject to having our brains changed by artificial intelligence.
References:
Chomsky, Noam, Language and Mind. New York: Harcourt Brace Jovanovich, Publishers, 1972. A dominant theory of the origin of language for much of the second half of the 20th century now largely discarded.
Dehaene, Stanislas, Reading in the Brain: The New Science of How We Read. New Yok: Penguin Random House LLC, 2009. An excellent discussion of the neuroscience behind speech.
Mithen, Steven, The Language Puzzle: Piecing Together the Six-Million-Year Story of How Words Evolved. New York: Basic Books, 2024. The archaeology of speech.
Pinker, Steven, The Language Instinct: How the Mind Creates Language. New York: Harper Perennial Classics, 1994. A classic.
Illustrations are all from The Smithsonian Museum of Natural History Human Origins. https://humanorigins.si.edu/evidence/human-fossils/species.





