I am, at least provisionally, calling that learnable structure the metaphysical structure of the world. Moreover, since humans did not arise de novo that metaphysical structure must necessarily extend through the animal kingdom and, who knows, plants as well.
“How”, you might ask, “does this metaphysical structure of the world differ from the world’s physical structure?” I will say, again provisionally, for I am just now making this up, that it is a matter of intension rather than extension. Extensionally the physical and the metaphysical are one and the same. But intensionally, they are different. We think about them in different terms. We ask different things of them. They have different conceptual affordances. The physical world is meaningless; it is simply there. It is in the metaphysical world that we seek meaning. [See my post, There is a fold in the fabric of reality. (Traditional) literary criticism is written on one side of it. I went around the bend years ago.]
* * * * * Does this make sense, philosophically? How would I know?
I get it, you’re just making this up.
Right.
Hmmmm… How does this relate to that object-oriented ontology stuff you were so interested in a couple of years ago?
Interesting question. Why don’t you think about it and get back to me.
I mean, that metaphysical structure you’re talking about, it seems almost like a complex multidimensional tissue binding the world together. It has a whiff of a Latourian actor-network about it.
Hmmm… Set that aside for awhile. I want to go somewhere else.
Still on GPT-3, eh?
You got it.[1]
* * * * *
Text reflects this learnable, this metaphysical, structure, albeit at some remove:
There are two things in play: 1) the fact that the text is learnable, and 2) that it is learnable by a statistical process. How are these two related?
If we already had an explicit ‘old school’ propositional model in computable form, then we wouldn’t need statistical learning at all. We could just run the propositional model over the corpus and encode the result. But why do even that? If we can read the corpus with the propositional model, in a simulation of human reading, then there’s no need to encode it at all. Just read whatever aspect of the corpus is needed at the time.
So, statistical learning is a substitute for the lack of a usable propositional model. The statistical model does work, but at the expense of explicitness.
But why does the statistical model work at all? That’s the question.
It’s not enough to say, because the world itself is learnable. That’s true for the propositional model as well. Both work because the world is learnable.
* * * * *
BUT: Humans don’t learn the world with a statistical model. We learn it through a propositional engine floating over an analog or quasi-analogue engine with statistical properties. And it is the propositional engine that allows us to produce language. A corpus is a product of the action of propositional engine, not a statistical model, acting on the world.
Description is one basic such action; narration is another. Analysis and explanation are perhaps more sophisticated and depend on (logically) prior description and narration. Note that this process of rendering into language is inherently and necessarily a temporal one. The order in which signifiers are placed into the speech stream depends in some way, not necessarily obvious, on the relations among the correlative signifieds in semantic or cognitive space. Distances between signifiers in the speech stream reflect distances between correlative signifieds in semantic space. We thus have systematic relationships between positions and distances of signifiers in the speech stream, on the one hand, and positions and distances of signifieds in semantic space. It is those systematic relationships that allow statistical analysis of the speech stream to reconstruct semantic space.
Note that time is not extrinsic to this process. Time is intrinsic and constitutive of computation. Speaking involves computation, as does the statistical analysis of the speech stream.
The propositional engine learns the world via Gärdenfors’ dimensions [2], and whatever else, Powers’ stack for example [3]. Those dimensions are implicit in the resulting propositional model and so become projected onto the speech stream via syntax, pragmatics, and discourse structure. The language engine is then able to extract (a simulacrum of) those dimensions through statistical learning. Those dimensions are expressed in the parameter weights of the model. THAT’s what makes the knowledge so ‘frozen’. One has to cue it with actual speech.
The whole language model thus functions as associative memory [4]. You present it with an input cue, and it then associates from that cue with each emitted string ‘feeding back’ into the memory bank via associative memory.
[1] This post is an exploration of ideas raised in the course of thinking about GPT-3. See William Benzon, GPT-3: Waterloo or Rubicon? Here be Dragons, Working Paper, August 5, 2020, 32 pp., Academia: https://www.academia.edu/s/9c587aeb25; SSRN: https://ssrn.com/abstract=3667608
ResearchGate: https://www.researchgate.net/publication/343444766_GPT-3_Waterloo_or_Rubicon_Here_be_Dragons.
[2] Peter Gärdenfors, Conceptual Spaces: The Geometry of Thought, MIT Press, 2000; The Geometry of Meaning: Semantics Based on Conceptual Spaces, MIT Press, 2014.
[3] William Powers, Behavior: The Control of Perception (Aldine) 1973. A decade later David Hays integrated Powers’ model into his cognitive network model, David G. Hays, Cognitive Structures, HRAF Press, 1981.
[4] The idea that the brain implements associative memory in a holographic fashion was championed by Karl Pribram in the 1970s and 1980s. David Hays and I drew on that work in an article on metaphor, William Benzon and David Hays, Metaphor, Recognition, and Neural Process, The American Journal of Semiotics , Vol. 5, No. 1 (1987), 59-80, https://www.academia.edu/238608/Metaphor_Recognition_and_Neural_Process.