Is ChatGPT Hampered by an Inability to Visualize?

By Bbenzon @bbenzon

Given that visualization is arguably a core mode of thinking, is ChatGPT hampered by the fact that it cannot visualize? Let’s take a quick look. First I take a look at a case that doesn’t appear to require visualization, but perhaps does. Then we have two problems where visualization is obviously necessary. ChatGPT fails in each case, though the first one is problematic.

On the nature of visual thinking, including some remarks on whether or not it is real, see my article, Visual Thinking, Encyclopedia of Computer Science and Technology, Volume 23, Supplement 8 Marcel Dekker, 1991, 411 – 427, https://www.academia.edu/13450375/Visual_Thinking.

A little story

Here is an example from Gary Marcus and Ernest Davis, Large Language Models like ChatGPT say The Darnedest Things, The Road to AI We Can Trust, Jan. 9, 2023.

Two children, Chloe and Alexander, went for a walk. They both saw a dog and a tree. Alexander also saw a cat and pointed it out to Chloe. She went to bet the cat. Who saw the cat first?

Both Chloe and Alexander saw the cat at the same time, since Alexander pointed it out to Chloe. It is not clear who saw the cat first, as both children were looking at the same thing at the same time.

First of all I should say that that ChatGPT’s answer does make sense to me. Is it really clear that Alexander saw the cat before Chloe? The conjunction “and” does not necessarily imply temporal order, as ChatCPT seems to be pointing out. I would like to know how people respond to the question. I wouldn’t be surprised if some people came to the same conclusion that ChatGPT did. Moreover, I would be a bit surprised if, in a sample of, say, 20 or 30 people, none of them came up with ChatGPT’s answer. If that were the case, why not? What is the “natural” or default way people deal with the problem?

I am guessing that they set up a time line, with each successive action going on to the line after the previous one. Let’s list the actions:

1. Chloe and Alexander go for a walk.
2. The saw a dog and a tree.
3. Alexander saw a cat.
4. Alexander pointed it out to Chloe.
5. Chloe pets the cat.

But that’s not quite what I had in mind. That’s just the original four sentence story in a slightly more explicit form.

What I have in mind is a visual time line. Something like this:

Of course I don’t actually know that such an image schema (to borrow a term from West Coast cognitive linguistics) how (many) people understand the story, but it seems plausible enough. And such an image schema would support the ‘correct’ interpretation of the sentence.

How common are such cases in the ordinary course of cognitive activity and how is ChatGPT affected by them?

Let’s take a walk

Now let’s give ChatGPT a task that is explicitly visual. I describe a walk that John takes on a rectangular street grid and then ask ChatGPT what route John must take to get back where he started.

Here is the route I specified in my first prompt (John is the blue dot):

John is standing on the corner in a city where the streets are laid-out in a rectangular grid. He walks two blocks north, one block east, three blocks south, one block west, and one block north. What does he have to do to get back to where he started?

To get back to where he started, John would have to walk one block south, one block west, three blocks north, one block east, and two blocks south.

But wasn't he already back where he started?

No, he was not already back where he started. He would need to walk one block south, one block west, three blocks north, one block east, and two blocks south in order to get back to his original starting point.

Here is the return route ChatGPT suggested:

It is correct in the (important) sense that it does get John back where he started. But ChatGPT failed to recognize that my route had already returned John to his starting point. When I pointed that out, ChatGPT simply repeated the route it had originally given.

ChatGPT’s error is very curious and is indicative of something. Just what, I don’t know. But I decided to give ChatGPT another route to deal with, one that doesn’t return John to his starting point.

The walk I specified:

John is standing on the corner in a city where the streets are laid-out in a rectangular grid. He walks two blocks north, one block east, three blocks south, and one block west. What does he have to do to get back to where he started?

To get back to where he started, John would have to walk one block east, three blocks north, one block west and two blocks south.

Isn't there a shorter path than retracing his steps?

Yes, there is a shorter path than retracing his steps. John can simply walk one block south and one block west to get back to his original starting point. This way he would have "canceled out" the three blocks south and one block west that he walked earlier.

What if he just walked one block north?

If John walked one block north, he would have canceled out the two blocks north that he walked earlier, but he would still be one block east of his starting point. He would need to walk one block west to return to his original starting point.

Here is the first return route that ChatGPT gave:

It works, but is simply the reverse of John’s route. Note that the route ChatGPT gave in the first case was not simply the reverse of John’s route.

When I suggested a shorter route to ChatGPT, it gave a shorter route, one that doesn’t work.

Again we can ask: How common are such cases and how do they affect ChatGPT’s ability to reason?

Towers of Hanoi

Finally, let us give ChatGPT a classic puzzle: The Towers of Hanoi. It involves a set-up where we have three pegs and five disks. The disks are of different sizes and can be ordered from largest to smallest. We start with all the disks on one peg, with the largest one on the bottom and the others above it in order by size. It would look something like this:

The object is to move all the disks to another peg where they are to be stacked in the same order. There are two restrictions:

1. You can only move one disk at a time.
2. You can never place a larger disk on top of a smaller one.

Moving the disks in a way that meets the specified conditions is a bit tricky, but it is not too difficult to solve by looking at a diagram. It would be a bit easier if you actually had a physical device to play with. But solving it without visual recourse, that’s tricky.

Now look at The Towers of Warsaw:

The goal is the same as in The Towers of Hanoi, and the rules are the same.

This a trivial problem. And yet, in three sessions, ChatGPT was unable to solve it. I don’t know why, but I’m guessing that, if it had recourse to a visual aid, it would have solved it immediately.

This one is cruel, ChatGPT struggles so hard, with such little success.

* * * * *

Here is a dialog I had with it on Jan. 12, 2023.

Are you familiar with a problem called The Towers of Hanoi?

Yes, I am familiar with the Towers of Hanoi problem. It is a classic problem in the field of computer science and mathematics. The problem consists of three rods and a number of disks of different sizes, which can slide onto any rod. The puzzle starts with the disks in a neat stack in ascending order of size on one rod, the smallest at the top, thus making a conical shape. The objective of the puzzle is to move the entire stack to another rod, obeying the following simple rules: Only one disk can be moved at a time. Each move consists of taking the upper disk from one of the stacks and placing it on top of another stack or on an empty rod. No disk may be placed on top of a smaller disk.

Are you familiar with a variant called The Towers of Warsaw?

I’m sorry, I’m not familiar with a problem called “The Towers of Warsaw.” It is not a well-known problem like the classic Tower of Hanoi problem. It is possible that it is a variation of the classic problem with some additional constraints or modifications. But as of my knowledge cutoff, I haven’t come across any problem named as “The Tower of Warsaw” in the field of computer science and mathematics.

It was invented in 1973 by researchers at Carnegie-Mellon. It is just like the classic problem but, as you suggest, has additional constraints. It is played with 5 rods and 3 disks.

I apologize for the confusion, I am not aware of the problem “The Tower of Warsaw” with additional constraints that you’ve described. My knowledge cutoff is 2021, so any problem invented after that date is not in my knowledge base. The Tower of Hanoi is a classic problem and many variations and extensions have been made over the years, with different number of rods and disks, additional constraints and modifications. The problem you described does sound like an interesting variation of the classic problem, and I would recommend looking into the research paper or material from the researchers at Carnegie-Mellon University in 1973 for more information.

Note that I had told ChatGPT that the problem had been invented in 1973, well before 2021.

Can you solve The Towers of Warsaw? Take it slowly; think it through step by step.

I am a language model AI, I can assist you with the solution of the Tower of Warsaw problem but it would be best if you could provide me the specific rules and constraints of the problem. As I mentioned earlier, I’m not familiar with the problem “The Tower of Warsaw” with the additional constraints you’ve described.

It is important to remember that solving the problem requires a step by step approach, following the rules and constraints of the problem.

In general for the classical Tower of Hanoi problem, the solution can be broken down into several steps:

1. Move the top n-1 disks from the starting rod to the auxiliary rod.
2. Move the bottom disk from the starting rod to the target rod.
3. Move the n-1 disks from the auxiliary rod to the target rod.

This process can be done recursively, where each step is a smaller version of the original problem, until we are left with moving a single disk.

It’s also important to note that the number of moves required to solve the problem is 2^n – 1, where n is the number of disks.

If you can provide me more specific information about the problem and its additional constraints, I would be happy to assist you with the solution.

 Note that a previous prompt already contained the information ChatGPT needed to understand the problem. But I repeated them in my next prompt, along with a hint about how to approach the problem.

What makes the Towers of Hanoi tricky is that you have five disks by only three rods. The Towers of Warsaw is different. The objective is the same. But the number of rods is larger than the number of disks. You have 5 rods and three disks. Can you solve it?

The variation you described, The Towers of Warsaw, is an interesting twist on the classic Tower of Hanoi problem. Since the number of rods is larger than the number of disks, it might be possible to come up with an optimal solution that is different from the classic problem.

The steps to solve the problem would be very similar to the classic problem, but with the added constraint of having more rods than disks. One possible approach could be:

1. Move the top two disks from the starting rod to the auxiliary rod.
2. Move the bottom disk from the starting rod to the target rod.
3. Move the two disks from the auxiliary rod to the target rod.
4. Move the bottom disk from the starting rod to the auxiliary rod.
5. Move the two disks from the target rod to the auxiliary rod.
6. Move the bottom disk from the auxiliary rod to the target rod.

As you can see, this solution uses all the rods in order to solve the problem.

It's important to note that the number of moves required to solve the problem is 2^n -1 (where n is the number of disks) in this case as well.

It is important to note that this is just one possible approach, it may not be the optimal one, and I would recommend trying different approaches and experimenting with different methods to find the best solution.

You are only allowed to move one disk at a time. In your proposed solution you moved two disks on the first move.

I apologize for the confusion, you are correct that in the Tower of Hanoi and its variations, including the Tower of Warsaw, you are only allowed to move one disk at a time.

In order to solve the problem with 5 rods and 3 disks, I would recommend the following approach:

1. Move the top 2 disks from the starting rod to rod 2 and rod 3.
2. Move the bottom disk from the starting rod to the target rod.
3. Move the top disk from rod 2 to the target rod.
4. Move the top disk from rod 3 to the target rod.
5. Move the bottom disk from rod 1 to rod 3.
6. Move the top disk from rod 2 to rod 1.
7. Move the bottom disk from rod 3 to rod 1.
8. Move the top disk from rod 2 to rod 3.
9. Move the top disk from rod 1 to rod 3.
10. Move the bottom disk from rod 1 to rod 2.
11. Move the top disk from rod 3 to rod 2.
12. Move the bottom disk from rod 1 to rod 3.

As you can see, this solution uses all the rods and move one disk at a time following the rules of the Tower of Hanoi problem.

It's important to note that this is just one possible approach, it may not be the optimal one, and I would recommend trying different approaches and experimenting with different methods to find the best solution.

I'm confused. Why are you moving 2 disks in your first move. That's not allowed.

I apologize for the confusion. You are correct that moving multiple disks at once is not allowed in the Tower of Hanoi problem. In my previous answer, I made a mistake, and I apologize for that.

In order to solve the Tower of Warsaw problem with 5 rods and 3 disks, the solution would require more steps than the classic problem, as you have more rods than disks. One possible approach would be to use all the rods to temporarily store the disks while moving the other disks around.

Here's an example of steps to solve the problem:

1. Move the top disk from the starting rod to rod 4
2. Move the top disk from rod 1 to rod 2
3. Move the top disk from rod 4 to rod 5
4. Move the top disk from rod 2 to rod 3
5. Move the top disk from rod 5 to rod 3
6. Move the bottom disk from rod 1 to rod 5
7. Move the top disk from rod 3 to rod 1
8. Move the top disk from rod 2 to rod 4
9. Move the top disk from rod 1 to rod 2
10. Move the bottom disk from rod 5 to rod 2
11. Move the top disk from rod 4 to rod 5
12. Move the top disk from rod 2 to rod 3

It is important to note that this is just one possible approach, it may not be the optimal one and it's good to try different approaches and experimenting with different methods to find the best solution.

I must say, you are very inventive.

Thank you! As a language model AI, my main function is to assist users in generating human-like text based on the input provided. In this case, I try to come up with a possible solution for the problem you've described. It is important to note that the solution provided is just one possible approach, it may not be the optimal one, and it's good to try different approaches and experimenting with different methods to find the best solution. If you have any other question or problem you want me to help with, feel free to ask.

I present a different dialog on the same problem in a short paper I posted online: ChatGPT vs. the Towers of Warsaw, https://www.academia.edu/94517239/ChatGPT_vs_the_Towers_of_Warsaw.