The Next Chapter in the Story of AI
What do you know about Spatial AI?
As we enter the second quarter of the 21st century, a new form of artificial intelligence is appearing on the scene, one that could advance the field tremendously. This new approach combines computer vision and depth sensing with 3D mapping to transform AI from being a tool that mostly understands information into one that understands the world.
One of the great things about studying and working with the future is that it’s always changing. And nowhere is that so apparent as in the field of artificial intelligence. Since its beginnings in the mid-20th century, the field has built up incrementally, growing through new advances and AI winters. More than a decade ago the pace of change seemed to speed up significantly, at least in terms of outward progress. This was largely due to advances in neural networks, the architectures inspired by the interconnections between the neurons in our own brains. But it was the rise of Generative AI over the past few years that saw artificial intelligence truly expand in ways that increasingly challenge our policies, our society, our businesses, and our imaginations. It’s been a wild ride and it’s far from over.
Since generative AI exploded onto the scene, many people have come to see it as the ultimate tool. A form of machine intelligence with infinite possibilities. Researchers, businesses, journalists, and especially the public have increasingly committed ever more attention and resources to it. Many leaders of companies like OpenAI, Google, Meta and Nvidia have advocated for an almost unlimited scaling up of these systems, in the hope that there is a threshold beyond which everything becomes possible. They want to keep scaling GPUs, data centers, models and tokens, electricity and water consumption and much more. All based on the mindset that bigger will be better and more importantly that it will bring new levels of advanced machine intelligence.
But there are many other AI scientists and researchers who don’t buy this. Recognizing that AI has always been a field of multiple approaches, some believe the large model approach will eventually lead to a dead-end. A dead-end that is currently sucking all the air, financing and talent out of the room, leaving little for alternate approaches to development. Some of these scientists are beginning to act on that belief.
In 2024, Professor Fei-Fei Li of Stanford, along with Justin Johnson, Christoph Lassner, and Ben Mildenhall founded World Labs, a company dedicated to advancing the field of spatial AI. Li is frequently referred to as the godmother of AI, partly due to her role in the early development of ImageNet nearly 2 decades ago, a database that helped advance machine vision development.
At World Labs, Li and her team are working to change fundamental aspects of how AI learns. This represents a major paradigm shift that could be so important in moving past certain hurdles. Currently, neural networks, transformers and large models achieve their superhuman capabilities through statistical analytic methods. These algorithms are applied to the immense body of information humanity has been pouring onto the Internet these past four decades, training models to emulate at least some aspects of our own intelligence. Information, be it good, bad, or suspect. All of it based on vast corpuses of data that are well beyond the human mind’s ability to conceive or imagine.
It also bears little resemblance to the way that people learn.
From the time we’re born (and possibly earlier), we are continuously building a mental map of our environment. As we begin to interact with the world and move through it, we learn and make mistakes. We acquire new insights and associations. We become connected to our world and our culture. We learn about the objects around us, the physics that direct them, the nature of cause and effect, both physically and conceptually. We literally assemble a model of the world around us in our minds in real time.
The building of this mental digital twin never stops. The depth and breath of this world, its relationships, interactions, experiences, and emotions are something that will never be fully captured in a matrix of values or probabilities of data. And even if that ever did become possible, the enormous differences in how those experiences were acquired and modeled will prevent building a truly accurate representation that emulates our own.
Probably the biggest difference with spatial AI is that it will make it possible for digital systems to understand how objects relate in both space and time. This will allow it to infer intent and possible future actions based on the nature and proximity of the objects. For instance, if a person is standing next to a car, there’s a likelihood they will soon open the car door and get inside. This is a huge difference from how current LLMs work by predicting the next most probable word in a sequence. Context becomes less about how words and tokens relate to each other and more about the relationship between objects.
Spatial AI seeks to develop a new way for machines to learn, one that is modeled on how you and I do it. It’s hoped by doing this we can not only advance AI to the next level, but also make it align better with our own human values and priorities, because it will understand the world more as we do.
World Labs is far from the only player in this space. Recently, Yann LeCun, who is also frequently called one of the godfathers of AI, launched Advanced Machine Intelligence Labs. The former longtime head of FAIR – Facebook AI Research, LeCun has long been outspoken about the limitations of the current approaches to Generative AI, believing they’ll eventually hit an insurmountable limit.
AMI Labs’ mission is to develop world models in order to “build intelligent systems that understand the real world.” The company maintains that “real intelligence does not start in language. It starts in the world.” LeCun recently noted that “the approaches that have been successful for language, do not work for highly dimensional continuous noisy data.” In other words, the world around us.
This will be essential as various forms of embodied AI become common. From autonomous vehicles to robot co-workers, most current approaches are brittle. They aren’t capable of dealing with all the noisy, unexpected edge cases in the everyday world. As the story of artificial intelligence continues to be written, hopefully, Spatial AI will one day overcome these limits, so that our tools can become not only more intelligent, but more useful as well.
