Arithmetic and Machine Learning, Part 2

Continuing from my post of 4.26.22, Why is simple arithmetic difficult for deep learning systems? I posted this to LessWrong on 5.11.22:

I’ve been thinking about ordinary arithmetic computation in this context. We know that models have trouble with it. The issue interests me because arithmetic calculation has well-understood procedures. We know how people do it. And by that I mean that there’s nothing important about the process that’s hidden, unlike our use of ordinary language. The mechanisms of both sentence-level grammar and discourse structure are unconscious.

It's pretty clear to me that arithmetic requires episodic structure, to introduce a term from old symbolic-systems AI and computational linguistics. That’s obvious from the fact that we don’t teach it to children until grammar school, which is roughly episodic level cognition kicks in (see the paper Hays and I did, Principles and Development of Natural Intelligence).

Arithmetic is not like ordinary language is for humans, which comes to us naturally without much specific training. Fluency in arithmetic requires years of drill. First the child must learn to count; that gives numbers meaning. Once that is well in hand, children are drilled in arithmetic tables for the elementary operations, and so forth. Once this is going smoothly one learns the procedures multiple-digit addition and subtraction, multiple-operand addition and then multiplication and division. Multiple digit division is the most difficult because it requires guessing, which is then checked by actual calculation (multiplication followed by subtraction).

Why do such intellectually simple procedures require so much drill? Because each individual step must be correct. You can’t just go straight ahead. One mistake anywhere, and the whole calculation is thrown off.

Whatever a model is doing in inference mode, I doubt it’s doing anything like what humans do. Where would it pick that up on the web?

I don’t know what’s going on inside model in inference mode, but I’d guess it’s something like this: The inference engine ‘consumes’ a prompt, which moves it to some position in its state space.

It has a number of possibilities for moving to a new position.

It picks one and emits a word.

Is it finished? If so, stop. If not, return to 1.

And so it moves through its state space in a single unbroken traversal. You can’t do arithmetic that way. You have to keep track of partial results and stop to retrieve them so you can integrate them into the ongoing flow of the calculation.

So now the question is: What other kinds of tasks require the computational style that arithmetic does? Perhaps generating a long strong of coherent prose does.

Let me think about that for awhile.

I’m still thinking. This is going to be rough and crude, but it’s what I need to do at the moment. Sorry.

Episodic structure involves localizing things and events in time and space. So we’ve got “T & E” for things and events and “T•S” for time and space, thus: [T•S(T & E)]. A string of them:

[T•S(T & E)] --> [T•S(T & E)] --> [T•S(T & E)]

Or we could simplify: E --> E --> E. Or just: E1, E2, E3.

So:

E1: Johnny went though the door.
E2: Johnny walked past the tree.
E3: Johnny croossed the street.

In arithmetic, to add 15 and 7, we say (mentally) something like:

5 plus 7 equals 12
write 2, carry the 1
1 plus 1 equals 2
write 22

That looks something like:

E1: 5 + 7 = 12
E2 (reserve 1): write 2
E3 (retrieve 1): 1 + 1 = 2
E4 (2 concat 2): write 2_ = 22

My point is that the intermediate result is being tracked at the episode level, not the proposition level.

So that’s one thing. I’m also thinking about the fact that, in Vygotsky’s view, language involves internalizing an Other. And once we’ve done that and have it thoroughly routinized – leap of logic – we’re ready to learn to write and to do arithmetic. Why? because we need that internalized Other to keep track of the distinction between the episode and the proposition(s) in the episode. The internalized Other marks the episode while we worry about the proposition(s).

Think about this: we need episodic structure to distinguish between signifier and signified. It’s once we’ve acquired episodic structure that we learn to read and write. Reading and writing forces awareness of the signifier/signified distinction on us because it confronts us with two different signifier for the same signified.

And arithmetic calculation requires awareness of that distinction. “2 + 2”, “3 + 1”, “2 * 2”, and “9 – 5” (among many others), and “4” are all signifiers for the same cardinal value. How do we learn that strange fact? Through counting objects and working with collections of objects in conjunction with those number symbols. Counting is episodic. It allows us to see numerals, as signifiers, and counted objects and signifieds. Doing abstracted arithmetic forces us to treat the numerals as signifiers for imaginary objects.

These deep learning engines have no distinction between signifier and signified. They have no episodic structure. Theirs is a very thing and flat world.

More later.

Culture Magazine

Arithmetic and Machine Learning, Part 2

About the author

Author's Latest Articles

Magazines

COMMUNITY CULTURE