I’m neither worried nor pleased at the prospect of superintelligent computers. Why? Because they aren’t going to happen in the foreseeable future, if ever. I figure the future is going to be more interesting than that. Why? Because: the next singularity.
“The interests of humanity may change, the present curiosities in science may cease, and entirely different things may occupy the human mind in the future.” One conversation centered on the ever accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue. –Stanislaw Ulam, from a tribute to John von Neumann
Any sufficiently advanced technology is indistinguishable from magic. –Arthur C. Clarke
Singularities – in the sense of a new regime “beyond which human affairs, as we know them, could not continue” – are not new in human history. Nor are they hard-edged. They are most easily seen in retrospect. Our nineteenth century predecessors could not have imagined the Internet or microsurgery, nor could our ninth century predecessors have imagined the steam locomotive or the telephone.
I’m sure that future developments in computing will be amazing, and many of them likely will be amazing in ways we cannot imagine, but I doubt that our successors will see superintelligent computers, not in the foreseeable future and perhaps even not ever. Yes, there will be a time in the future when technology changes so drastically that we cannot now imagine, and thus predict, what will happen. No, that change is not likely to take the form of superintelligent computing.
Why do I hold these views? On the one hand there is ignorance. I am not aware of any concept of intelligence that is so well articulated that we could plan to achieve it in a machine in a way comparable to planning a manned mission to Mars. In the later case we have accomplished relevant projects – manned flight to the moon, unmanned flight to the Martian surface – and have reason to believe that our basic grasp of the relevant underlying physical principles is robust. In the case of intelligent machines, yes, we do have a lot of interesting technology, none of which approximates intelligence as closely as a manned mission to the moon approximates a manned mission to Mars. More tellingly, we are not in possession of a robust understanding of the underlying mechanisms of intelligent perception, thought, and action.
And yet we do know a great deal about how human minds work and about, for example, how we have achieved the knowledge that allows us to build DNA sequencing devices, smart phones, or to support humans in near-earth orbit for weeks at a time. This knowledge suggests that super-intelligent computing is unlikely, at least if “super-intelligence” is defined to mean surpassing human intelligence in a broad and fundamental way.
Human Intelligence and Its Cultural Elaboration
When the work of developmental psychologist Jean Piaget finally made its way into the American academy in the middle of the last century the developmental question became: Is the difference between children’s thought and adult thought simply a matter of accumulated facts or is it about fundamental conceptual structures? Piaget, of course, argued for the latter. In his view the mind was constructed in “layers” where the structures of higher layers were constructed over and presupposed those of lower layers. It’s not simply that 10-year olds knew more facts than 5-year olds, but that they reasoned about the world in a more sophisticated way. No matter how many specific facts a 5-year old masters, he or she cannot think like a 10-year old because he or she lacks the appropriate logical forms. Similarly, the thought of 5-year olds is more sophisticated than that of 2-year olds and that of 15-year olds is more sophisticated than that of 10-year olds.
This is, by now, quite well known and not controversial in broad outline, though Piaget’s specific proposals have been modified in many ways. What’s not so well known is that Piaget extended his ideas to the development of scientific and mathematical ideas in history in the study of genetic epistemology. In his view later ideas developed over earlier ones through a process of reflective abstraction in which the mechanisms of earlier ideas become objects manipulated by newer emerging ideas. In a series of studies published in the 1980s and 1990s the late David Hays and I developed similar ideas about the long-term cultural evolution of ideas.
We theorized about cognitive ranks, where later ranks developed over earlier ones through reflective abstraction (see Mind-Culture Coevolution: Major Transitions in the Development of Human Culture and Society). Our fundamental paper is The Evolution of Cognition (Journal of Social and Biological Structures 13(4): 297-320, 1990) and the remainder of this section is adapted from it as is the next section. You will find full citations in that article.
The basic idea of cognitive rank was suggested by Walter Wiora’s work on the history of music, The Four Ages of Music (1965). He argued that music history be divided into four ages. The first age was that of music in preliterate societies and the second age was that of the ancient high civilizations. The third age is that which Western music entered during and after the Renaissance. The fourth age began with this century. (For a similar four-stage theory based on estimates of informatic capacity, see for example D. S. Robertson, The Information Revolution. Communication Research 17, 235-254.)
This scheme is simple enough. What was so striking to us was that so many facets of culture and society could be organized into these same historical strata. It is a commonplace that all aspects of Western culture and society underwent a profound change during the Renaissance. The modern nation state was born, the scientific revolution happened, art adopted new forms of realistic depiction, attitudes toward children underwent a major change, as did the nature of marriage and family, new forms of commerce were adopted, and so forth. If we look at the early part of our own century we see major changes in all realms of symbolic activity—mathematics, the sciences, the arts—while many of our social and political forms remain structured on older models.
The transition between preliterate and literate societies cannot easily be examined because we know preliterate societies only by the bones and artifacts they've left behind and the historical record of the ancient high civilizations is not a very good one. Instead we have to infer the nature of these ancient cultures by reference to modern anthropological investigations of preliterate cultures (just as biologists must often make inferences about the anatomy, physiology, and behavior of extinct species by calling on knowledge of related extant species). When we make the relevant comparisons we see extensive differences in all spheres.
Social order in preliterate societies may involve nothing more than family relationships, or at most the society extends kinship by positing ancient common ancestors. With little or no apparatus of government, civil order is maintained by feud or fear of feud. In literate societies, social order is kept by etiquette, contract, and courts of equity, and civil order is maintained by police and courts of justice. In preliterate societies each community, of 5 to 500 members (and generally less than 200) is autonomous until, about 6000 years ago, chiefdoms appear in a few places: groups of villages forced into submission. In literate societies villages grow into towns and cities, which organize the villages of their hinterlands into kingdoms. Preliterate societies depend on the skills of hunting and gathering, of slash-and-burn farming, pottery, and a few more crafts, which are sound and effective where they exist. In literate societies certain persons trained to think choose to think about farming and write manuals for the agrarian proprietor – and eventually manuals of other crafts appear. Finally, Lawrence Kohlberg has found evidence that people in preliterate societies have less sophisticated moral concepts than people in literate societies.
The appearance of writing was followed by the Mosaic law and the prophets of Israel, and by the Periclean Age in Athens. The architecture, democratic political system, and above all the philosophy ¬– both natural and moral – of the Hebrews and Greeks was so different from all predecessors that we tend to think of our civilization as beginning with them. In fact, a period of cultural regression followed the fall of Rome and before the Renaissance could begin a “little renaissance” beginning about A.D. 1000 and reaching its peak with Aquinas in the 13th Century was necessary to raise Europe once more to a literate level. Our civilization combines elements of Greek, Roman, and Hebrew antiquity with Moslem, Indian, Chinese, and Germanic elements.
Socio-Cultural Singularities
Hays and I believed that these four ages, the systematic differences between cultures at these four levels of cultural evolution, are based on differences in cognitive mechanism. As cultures evolve they differentiate and become more complex and sophisticated, the more sophisticated cultures having cognitive mechanisms unavailable to the less sophisticated. Over the long term this process is discontinuous; it passes through singularities in the sense of Ulam and von Neumann. People on the old side of a socio-cultural singularity cannot imagine the world of people on the near side.
The post-singularity modes of thought and action permit a dramatic reworking of culture and society, a reworking that is ultimately engendered by a new capacity for manipulation of abstractions. The thinker/artist/social actors on different sides of these singularities are thus operating with different ontologies–to use a philosophical term that entered computer science and AI in the last few decades–from that employed by the older ones. The new ideas and practices cannot be reduced to strings of ideas stated within the old ontology, though crude approximations are often possible. Indeed, science and technological journalists use such crude approximations to convey the newer ideas to a general audience. But those approximations do not give you the ability to use the newer ideas in a powerful way.
These several kinds of thinking are cumulative; a simpler kind of thinking does not disappear from a culture upon the introduction of a more complex kind. A culture is assigned a rank according to the highest kind of thinking available to a substantial fraction of its population. That a culture is said to be of Rank 3 thus doesn’t imply that all adult members have a Rank 3 system of thought. It means only that an influential group, a managing elite if you will, operates with a Rank 3 cognitive system. The rest of the population will have Rank 1 and Rank 2 conceptual systems.
Each cognitive process is associated with a new conceptual mechanism, which makes the process possible, and a new conceptual medium that allows the mechanism and process to become routine in the culture. This is an important point. The general effectiveness of a culture is not determined only by the achievements of a few of its most gifted members. What matters is what a significant, though perhaps small, portion of the population can achieve on a routine basis. The conceptual medium allows for the creation of an educational regime through which a significant portion of the population can learn effectively to exploit the cognitive process, can learn to learn in a new way.
Here is the scheme Hays and I proposed:
Process Mechanism Medium
Rank 1: Abstraction Metaphor Speech
Rank 2: Rationalization Metalingual Definition Writing
Rank 3: Theory Algorithm Calculation
Rank 4: Model Control Computation
In an earlier paper (Benzon and Hays, Principles and Development of Natural Intelligence, J. Social and Biological Structures, 11, 293-322, 1988) we argued that the human brain is organized into five layers of perceptual and cognitive processors. We called the top layer the gnomonic system and thought of it as organizing the interaction between the lower four layers (see also Hays, Cognitive Structures, HRAF Press, 1981). All abstractions form in the gnomonic system. The cognitive processes that concern us here are all regulated by this gnomonic system. Hence for my present purposes it is convenient to collapse this system into a two-level structure, with the gnomonic layer on top in an abstraction system and the other four layers on the bottom, collectively, the concrete system.
With a Rank 2 structure, Aristotle was able to write his philosophy. He presented it as an analysis of nature but we take it to be a reconstruction of the prior cognitive structure. In the Renaissance, some thinkers developed cognitive structures of rank 3. Exploitation of such structures produced all of science up through the late nineteenth century. Beginning, perhaps, with Darwin and going on to Freud, Einstein, and many others, a new kind of cognitive process appears. To account for it, we call on rank 4 processes. We understand earlier science to be a search for the one true theory of nature, whereas we understand the advanced part of contemporary science to be capable of considering a dozen alternative theories of nature before breakfast (with apologies to Lewis Carroll). The new thinker can think about what the old thinker thought with. And indeed we use that sentence to summarize the advance of each cognitive rank over its predecessor.
Computing Machines, Limitations and Promise
While we can, and many have, trace computing back to the ancients, the computing that most matters is the product of mid-20th century work in mathematics, logic, and engineering. The term “artificial intelligence” was coined in 1956 at a Dartmouth conference, but arguably the practical pursuit began with the slightly earlier work on the machine translation of natural language, though language research didn’t joint up with AI until the 1960s. Those classic systems, whether the expert systems of AI or the grammars and parsers of computational linguistics, consisted of hand-coded symbolic knowledge expressed in some appropriate logical formalism.
Early success in modeling the knowledge of expert knowledge on technical topics was followed by increasing frustration in trying to deal with common sense and with the sensory tasks that were easily mastered by young children, such as the visual recognition of objects or the recognition of spoken language. Common sense knowledge of the world was required to understand (and to generate) simple stories, a problem that investigators attacked in the 1970s. Common sense knowledge seems to consist of a never ending stream of individual facts – e.g. that umbrellas provide protection against the rain (and that rain consists of drops of water...), that they consist of a handle, flexible ribs, a covering, that some are collapsible, and some not, etc. – that fall into no logical scheme, but that a computing system must know in order to negotiate (stories about) the everyday world.
How do we capture, accumulate, and code all these bits of knowledge?
As an example of sensory tasks, consider the ARPA Speech Understanding Project of the mid-1970s. [ARPA: Advanced Research Projects Agency, now DARPA: Defense Advanced Research Projects Agency] The object was simple: the computer would take a vocal question about naval vessels and answer it by querying a database. Even when syntactic, semantic and pragmatic considerations were brought to bear on the speech stream, speech recognition was poor.
This project is so old that the Wikipedia doesn’t have an account of it. Try Dennis H. Klatt, Review of the ARPA Speech Understanding Project, J. of the Acoustical Society of America, 62, 1345 (1977); http://dx.doi.org/10.1121/1.381666
But a generation later, statistical techniques achieved much better speech recognition without calling on syntactic, semantic, and pragmatic knowledge. The mid-1970s techniques that failed were based on hand-coded knowledge. The 1990s techniques that succeeded were based on statistically sophisticated machine learning. Rather than program the computer with knowledge of phonetics and phonology, it was programmed with a learning method and then trained on a large body of real data until it was able to recognize speech sounds with a fairly high level of accuracy.
THAT, in general, is what happened within AI during the 1980s and 1990s, meticulously hand-coded theory-driven knowledge was replaced by statistical learning over massive databases. And performance improved. Statistical learning techniques were applied, not only to speech, but also to the visual world, and to robotic actions in the world. So-called subsumption architectures guided systems in bottom-up learning about the physical world.
Within the domain of language, IBM’s Watson is the best-known example of the newer machine-learning techniques. Such performance – beating human champions at Jeopardy – was light years beyond the question-answer technology of classical AI, expert-knowledge hand-coded into symbolic systems. While it remains to be seen whether or not Watson-class technology will reap the practical benefits IBM is seeking from it, there is no doubt that these newer statistical techniques are supporting remarkable practical technology, such as Apple’s Siri.
And yet, the limitations of these systems are also obvious. Listen to this presentation that David Ferrucci, lead developer on the Watson project, gave to the Allen Institute for Artificial Intelligence at the end of last year:
Here’s part of Ferrucci's comment on the video:
This talk draws an arc from Theory-Driven AI to Data-Driven AI and positions Watson along that trajectory. It proposes that to advance AI to where we all know it must go, we need to discover how to efficiently combine human cognition, massive data and logical theory formation. We need to boot strap a fluent collaboration between human and machine that engages logic, language and learning to enable machines to learn how to learn and ultimately deliver on the promise of AI.From that point of view Ferrucci’s most interesting remarks are at the end of the video, starting at 1:22:00 or so. Ferrucci imagines an interactive system where the computer uses human input to learn how to extend an internal knowledge representation, where that internal model is based on classical principles. Watson-style “shallow” technology provides the scaffolding that supports the computing system’s side of the man-machine interaction. Ferrucci believes that a team of ten people could create a qualitative leap forward in three to five years.
I don’t know whether or not he’s right. Maybe it’ll take 15 or 20 people, and maybe they’ll need 10 years. Who knows? But I think the general thrust is sound. We’ve to figure out deep and powerful ways to create hybrid systems incorporating top-down symbolic ‘knowledge’ and bottom-up learning at the sensory-motor interface.
What About the Next Singularity?
Let’s begin assessing our current situation by looking at the observed course of cultural evolution to date. We’re interested in the transition from one rank to the next and the interval between one rank-shift, or singularity, and the next. When looking at the following table, keep in mind that these transitions are not sharp; they take time. All we’re after, though, is a crude order of magnitude estimate:
Informatics Emergence, years ago
Rank 1 Speech 50,000
Rank 2 Writing 5,000
Rank 3 Calculation 500
Rank 4 Computing 50
The transition from one rank to the next seems to have gone down by an order of magnitude for each transition.
One does not have to look at those numbers for very long before wondering just what started emerging five years ago. While there is nothing in the theory that forbids the emergence of a fifth, or a sixth rank, and so on, it doesn’t seem plausible that the time between ranks can continue to diminish by an order of magnitude. The emergence of a new system of thought, after all, does not appear by magic. People have to think it into existence: How much time and effort is required to transcend the system of thought in which a person was raised? THAT limits just how fast new systems of thought can arise.
Now, let us assume that all computing technology to date is the result of Rank 4 thinking, including, for example, IBM’s Watson. Let us further assume that research programs such as the one David Ferrucci has proposed will remain within the scope of Rank 4 thinking, though they may be pushing its outer limits. As those programs develop and mature, their best and brightest investigators – and perhaps the youngest as well – may well find themselves thinking about the mechanisms their teachers thought with.
The stage is now set for the emergence of Rank 5 thinking, whatever that is.
These rebel investigators will then break away, if they have to – though if we’ve learned anything about such matters, our institutions won’t force that on them – and start crafting new (post)computing systems based on Rank 5 ideas from the ground up. Those are the systems we’re going to be watching. They will incorporate both a highly structured conceptual architecture–such as the one inherent in the computational geometry of the human brain–and the means to learn new concepts into/with that architecture.
But I don’t expect these systems to accelerate to superintelligence – Shazaam! – just like that. Consider these remarks by Robin Hanson, I Still Don’t Get Foom:
“Intelligence” just means an ability to do mental/calculation tasks, averaged over many tasks. I’ve always found it plausible that machines will continue to do more kinds of mental tasks better, and eventually be better at pretty much all of them. But what I’ve found it hard to accept is a “local explosion.” This is where a single machine, built by a single project using only a tiny fraction of world resources, goes in a short time (e.g., weeks) from being so weak that it is usually beat by a single human with the usual tools, to so powerful that it easily takes over the entire world.I would further add that this system HAS to build its way through four ranks of human cognitive ontologies, one after the other, just has humans do during the course of their education. As far as I know, for example, no human has ever learned rank 4 physics, such as quantum mechanics, straight out of grade school without first having mastered several “layers” of mathematical and scientific prerequisites. That’s just not how things go.
It’s one thing to keep ramping up computational throughput to way beyond human range, but mere capacity isn’t enough. Being superfast at Aristotelian physics doesn’t magically morph into a capacity for understanding Newtonian mechanics, nor does the capacity to zip through Newtonian mechanics result in the ability to Foom! into a knowledge of quantum mechanics. Aristotelian physics employs one ontology; Newtownian mechanics employs a different one; and the ontology of quantum mechanisms is different still. Computational speed does not drive capability from one ontology to another; that requires a fundamental change in representational mechanism.
On that account alone I’m deeply skeptical about superintelligence. This hypothetical device has to acquire and construct its superknowledge “from the inside” since no one is going to program it into superintelligence (we just don’t have the required engineering knowledge). If it is going to be superintelligent in this world – and what other world is there? – it is going to have to start learning from the sensory-motor world of the infant and construct its world knowledge layer by layer. Ben Goertzel has proposed a Piagetian learning regime for an artificial general intelligence (Patterns, Hypergraphs and Embodied General Intelligence (PDF), 2006), but success there only gets us a boisterous adolescent with ridiculously calculating skills. The machine still has to work its way up to at least Rank 5 cognitive skills – for there are many humans currently thinking at high Rank 4, including those who seek to build this machine.
It is those humans that really interest me. These will be thinkers, designers, and implementers for whom the sophisticated interweaving of top-down symbolic and bottom-up statistical processes is not a matter of hybrid engineering but is, instead, simply their rock-bottom sense of how things work. Those people will be living on the far side of the singularity we’ve been approaching since, say, the end of World War II.
What will they build? What kind of dreams will they have?