Culture Magazine

Video About Yann LeCun's “A Path Towards Autonomous Machine Intelligence”

By Bbenzon @bbenzon

Yann LeCun recently released his vision for deep learning, A Path Towards Autonomous Machine Intelligence, Version 0.9.2, 2022-06-27, https://openreview.net/forum?id=BZ5a1r-kVsf

Abstract: How could machines learn as efficiently as humans and animals? How could machines learn to reason and plan? How could machines learn representations of percepts and action plans at multiple levels of abstraction, enabling them to reason, predict, and plan at multiple time horizons? This position paper proposes an architecture and training paradigms with which to construct autonomous intelligent agents. It combines concepts such as configurable predictive world model, behavior driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.

Yannic Kilcher has uploaded a useful video explaining it:

Yann LeCun's position paper on a path towards machine intelligence combines Self-Supervised Learning, Energy-Based Models, and hierarchical predictive embedding models to arrive at a system that can teach itself to learn useful abstractions at multiple levels and use that as a world model to plan ahead in time.

OUTLINE:

0:00 - Introduction
2:00 - Main Contributions
5:45 - Mode 1 and Mode 2 actors
15:40 - Self-Supervised Learning and Energy-Based Models
20:15 - Introducing latent variables
25:00 - The problem of collapse
29:50 - Contrastive vs regularized methods
36:00 - The JEPA architecture
47:00 - Hierarchical JEPA (H-JEPA)
53:00 - Broader relevance
56:00 - Summary & Comments

* * * * *

During his explanation Kilcher uses learning from a videotape as his main example: The machine is presented with a videotape. What are its options in predicting the future course of the tape? That’s an entirely reasonable example, but it is also worlds away from the sorts of things I’ve thought about, literary texts and abstract ideas.

I bring this up because, as I recall, LeCun’s most frequent example is a self-driving car (I haven’t actually counted instances), which is obviously a highly salient example, but hardly representative of the full range of human reasoning. At the end there LeCun has a brief discussion of symbols (p. 47):

In the proposed architecture, reasoning comes down to energy minimization or constraint satisfaction by the actor using various search methods to find a suitable combination of actions and latent variables, as stated in Section 3.1.4.

If the actions and latent variables are continuous, and if the predictor and the cost modules are differentiable and relatively well behaved, one can use gradient-based methods to perform the search. But there may be situations where the predictor output changes quickly as a function of the action, and where the action space is essentially discontinuous. This is likely to occur at high levels of abstractions where choices are more likely to be qualitative. A high-level decision for a self-driving car may correspond to “turning left or right at the fork”, while the low-level version would be a sequence of wheel angles.

If the action space is discrete with low cardinality, the actor may use exhaustive search methods. If the action set cardinality, and hence the branching factor, are too large, the actor may have to resort to heuristic search methods, including Monte-Carlo Tree Search, or other gradient-free methods. If the cost function satisfied Bellman’s equations, one may use dynamic programming.

But the efficiency advantage of gradient-based search methods over gradient-free search methods motivates us to find ways for the world-model training procedure to find hierarchical representations with which the planning/reasoning problem constitutes a continuous relaxation of an otherwise discrete problem.

A remain question is whether the type of reasoning proposed here can encompass all forms of reasoning that humans and animals are capable of.

Yes, it is a question. Without a robust capability for dealing with symbols no model is going to “encompass all forms of reasoning that humans ... are capable of,” though perhaps Yann’s proposal will encompass animal reasoning. We’ll see.

I’m left with the persistent feeling that none of these researchers have thought seriously about language or mathematics, despite the success of large language models. In effect, they leave thinking about language to their models. The existence of large bodies of text allows them to treat texts as the objects of analog, and therefore differentiable, perception – something well worth thinking about from a theoretical point of view. All they think about are their gradient-based architectures. Reasonable enough, I suppose, but it’s no way to scale Mount AGI.


Back to Featured Articles on Logo Paperblog