From Words to World: The Case for Embodied AI (Part 1)

ryonck_a_small_cute_embodied_ai_walking_through_the_forest_looking_in_aa286575-9337-4e56-95b9-429a0e1c10bd

“We know not through our intellect but through our experience.”
~Maurice Merleau-Ponty, French philosopher

Recently, “What’s Right with the Future” has been looking at some potential alternatives to the current crop of generative AI models that have gained so much attention and funding during the past few years. Two of the alternatives discussed—Spatial AI and Large World Models—are related efforts that attempt to advance AI by relying less exclusively on language and more on interactions with the world. In doing this, researchers hope to build AI that learns much more like we do. Today, I want to explore another proposal for advancing machine intelligence based on similar rationale: Embodied AI.

Intelligence has been defined in many different ways over the years, but it’s most consistently associated with learning and the ability to adapt to the world around us. Though we rarely give this much thought, most of us would agree that to achieve this, first it helps to have a physical body.

From our earliest moments as children, we sense, observe and interact with the world around us. Through these interactions, we rapidly acquire an understanding about our environment, those things that are not us, and our relationship to them. How objects move, either on their own or by forces acting upon them. The ways light and sound shift around us. The weight of a stone or the resistance of a lever. The scent of a rose or a loaf of fresh baked bread. The ring of laughter in a child’s playroom. All of it teaches us about physical relationships, social interactions, and causality in a way that simply can’t be represented by data, vectors and tokens. The difference between measuring something and actually experiencing it is literally phenomenal.

Now try to imagine what this process would be like without a body or the ability to sense the world around you. Despite your best efforts, your perception of the world would be markedly different—if you experienced it at all.

Which is why some researchers have begun focusing on the idea that embodiment may be essential not only for learning through interaction with the world, it may be requisite in helping another intelligence to align with our own.

Embodied intelligence begins with a simple idea. Human intelligence doesn’t float in isolation. It emerges from a system that can sense and interact with the world, learning from the consequences of its actions. Every experience in our lives creates us, forming an ever-shifting gestalt that is at once unique and universal.

As conceived, embodied AI would do much the same thing. Gathering information through sensors, making decisions, taking action in interactive physical or simulated environments, it would learn by doing. Through this, a feedback loop would be generated. A loop that allows intelligence to form.

The language-based approaches of large language models (LLMs) achieve a mere shadow of this. While we human beings do rely heavily on language and symbolic thinking, we don’t do this exclusively. That’s just one part of our highly adaptable intelligence. While it’s true that by juggling language in our minds, we can build elaborate structures of meaning from words and symbols, these remain abstract and imperfect representations. Modern AI, especially large language models, does something similar, but has no other features of intelligence to fall back on. It recombines patterns found in vast stores of text and images, and by doing this, can achieve amazing things. But that knowledge of the world remains indirect. It’s an abstraction built from our own writings and other recorded works, not from the AI’s directly experiencing anything. (This is true not just for LLMs, but for large multimodal models as well.) This experience gap routinely produces errors. Errors that while sounding very convincing are often very wrong.

Embodied AI seeks to close that gap by building up a commonsense model of the world it interacts with. Learning through the cause and effect of experience. As conceived, that learning could be far more reliable and more transparent than any approach we’re currently using today. And because embodied systems can share what they discover, that experience will be able to inform and strengthen other AIs as well. Generating intelligence that is not just recorded, not just predicted, but lived.

In the next installment, we’ll look at some of the research and leaders who are driving this nascent field and their approaches, as well as the many use cases embodied AI may lead to over the coming decades.