Some of the attendant questions could be resolved by traveling back in time and making direct observations. Still, once we’d observed what happened and when it happened, questions would remain. We still wouldn’t know the neural and cognitive mechanisms, for they are not apparent from behavior alone. But our observations of just what happened would certainly constrain the space of models we’d have to investigate.
Unfortunately, we can’t travel back in time to make those observations. That difficulty has the peculiar effect of reversing the inferential logic of the previous paragraph. We find ourselves in the situation of using our knowledge of neural and cognitive mechanisms to constrain the space of possible historical sequences.
Except, of course, that our knowledge of neural and cognitive mechanisms is not very secure. And large swaths of linguistics are mechanism free. To be sure, there may be an elaborate apparatus of abstract formal mechanism, but just how that mechanism is realized in step-by-step cognitive and neural processes, that remains uninvestigated, except among computational linguists.
The upshot of all this is that we must approach these questions indirectly. We have to gather evidence from a wide variety of disciplines – archeology, physical and cultural anthropology, cognitive psychology, developmental psychology, and the neurosciences – and piece it together. Such work entails a level of speculation that makes well-trained academicians queasy.
What follows is an out-take from Beethoven’s Anvil, my book on music. It’s about a thought experiment that first occurred to me while in graduate school in the mid-1970s. Consider the often astounding and sometimes absurd things that trainers can get animals to do, things the don’t do naturally. Those acts are, in some sense, inherent in their neuro-muscular endowment, but not evoked by their natural habitat. But place them in an environment ruled by humans who take pleasure in watching dancing horses, and . . . Except that I’m not talking about horses.It seems to me that what is so very remarkable about the evolution of our own species is that the behavioral differences between us and our nearest biological relatives are disproportionate to the physical and physiological differences. The physical and physiological differences are relatively small, but the behavioral differences are large.
In thinking about this problem I have found it useful to think about how at least some chimpanzees came to acquire a modicum of language. All of them ended in failure. In the most intense of these efforts, Keith and Cathy Hayes raised a baby chimp in their household from 1947 to 1954. But that close and sustained interaction with Vicki, the young chimp in question, was not sufficient. Then in the late 1960s Allen and Beatrice Gardner began training a chimp, Washoe, in Ameslan, a sign language used among the deaf. This effort was far more successful. Within three years Washoe had a vocabulary of Ameslan 85 signs and she sometimes created signs of her own.
The results startled the scientific community and precipitated both more research along similar lines—as well as work where chimps communicated by pressing ironically identified buttons on a computerized panel—and considerable controversy over whether or not ape language was REAL language. That controversy is of little direct interest to me, though I certainly favor the view that this interesting behavior is not really language. What is interesting is the fact that these various chimps managed even the modest language that they did.
The string of earlier failures had led to a cessation of attempts. It seemed impossible to teach language to apes. It would seem that they just didn’t have the capacity. Upon reflection, however, the research community came to suspect that the problem might have more to do with vocal control than with central cognitive capacity. And so the Gardners acted on that supposition and succeeded where others had failed. It turns out that whatever chimpanzee cognitive capacity was, it was capable of surprising things.
Note that nothing had changed about the chimpanzees. Those that learned some Ameslan signs, and those that learned to press buttons on a panel, were of the same species as those that had earlier failed to learn to speak. What had changed was the environment. The (researchers in the) environment no longer asked for vocalizations; the environment asked for gestures, or button presses. These the chimps could provide, thereby allowing them to communicate with the (researchers in the) environment in a new way.
It seemed to me that this provided a way to attack the problem of language origins from a slightly different angle. So I imagined that a long time ago groups of very clever apes – more so than any extant species – were living on the African savannas. One day some flying saucers appeared in the sky and landed. The extra-terrestrials who emerged were extraordinarily adept at interacting with those apes and were entirely benevolent in their actions. These creatures taught the apes how to sing and dance and talk and tell stories, and so forth. Then, after thirty years or so, the ETs left without a trace. The apes had absorbed the ETs’ lessons so well that they were able to pass them on to their progeny generation after generation. Thus human culture and history were born.
Now, unless you actually believe in UFOs, and in the benevolence of their crews, this little fantasy does not seem very promising, for it is a fantasy about things that certainly never happened. Further, even if this had happened, it does seem to remove the mystery from language’s origins. Instead of something from nothing we have language handed to us on a platter. We learned it from some other folks, perhaps they were little short fellows with green skin, or perhaps they were the modern style aliens with pale complexions, catlike pupils in almond eyes and elongated heads.
But, and here is where we get to the heart of the matter, what would have to have been true in order for this to have worked? Just as the chimps before Ameslan were genetically the same as those after, so the clever before alien-instruction were the same as the proto-humans after. The species has not changed, the genome is the same – at least for the initial generation. The capacity for language would have to have been inherent in the brains of those clever apes. All the aliens did was activate that capacity. Once that happened the newly emergent proto-humans were able to sustain and further develop language on their own. Thus the critical event is something the precipitates a reconfiguration of existing capabilities.
However, language origins is not our problem. We are searching for the origins of music. So, instead of alien instruction in Hebrew or Sanskrit we can imagine alien instruction in samba or polka. The basic configuration and dynamics of the story remains the same. However, to make it real we have to get rid of those aliens and their instruction. Instead of the aliens we have only our group of clever apes. They are going to have to instruct one another. What we are looking for is a a way to get a gestalt switch in group dynamics that supports new modes of neural dynamics in the brains of individuals who are interacting with one another in a group.
Let us call this the Gestalt Origins Hypothesis:
Gestalt Origins: The precursor to music arose when groups of hominids interacted in a way that triggered a new configuration of operation in their existing nervous system.Notice that I talk of a precursor to music. I don’t think we can get from ape to music in a single bound. We need at least one precursor, something that is rhythmic, like music, but not yet fully formed. In order to get even that far, so my argument goes, our proto-humans need better control over their vocal cords than apes have, they need more rhythmic sophistication, and greater mimetic capacity.
Notice that this story says nothing about the adaptive value of music. I do intend to get around to that toward the end of the chapter, but that’s not my primary concern. My primary concern is getting our ancestors to the point where a gestalt switch can happen that will bring about a precursor to music, something we can call musicking. In order for that to happen we need to solve an adaptive problem or two. But those adaptations are not about music; they are about its precursors. Once music-making is going along smoothly we need another gestalt switch to differentiate it into language and music proper.