Embodied intelligence refers to the ability of robots to combine physical actions with advanced reasoning, perception, and decision-making. Unlike traditional machines that follow fixed commands, embodied robots can interpret complex instructions, adapt to new situations, and perform tasks with human-like flexibility. This appears set to become a reality in the not-too-distant future, as an increasing number of players and nations are entering this competitive arena. Currently, the race for embodied intelligence is intensifying.

The Embodied Intelligence Race
Tesla is placing huge weight on its humanoid robot project, Optimus. The company expects it to become a major part of its future, representing up to 80% of Tesla’s overall value. Elon Musk has set an ambitious target: producing one million humanoid robots per year by 2030. Optimus is designed primarily for factory use but is also being developed for consumer applications, ranging from household assistance to personal support.
Boston Dynamics, long known for its groundbreaking robotic designs, has turned its Atlas model into a fully electric humanoid platform. Atlas is capable of performing complex multi-task demonstrations, such as handling tools or navigating obstacle courses. With Toyota’s large Behavior Model supporting its development, Atlas is being positioned as an advanced industrial solution. However, like Tesla’s Optimus, Atlas still faces the challenge of moving beyond staged demonstrations toward everyday usability.
NVIDIA is entering the robotics race with a software-first approach. Its Isaac platform and GR00T foundation models provide robotics developers with tools for simulation, reasoning, and adaptability. NVIDIA describes GR00T as “the next wave of AI,” aiming to make robots more capable of decision-making in real-world settings. This positions NVIDIA as a key enabler of robotics rather than a hardware maker, offering the “brains” for embodied intelligence.
The Challenge: Demos vs. Dependability
Despite impressive demonstrations of robots running, jumping, or manipulating tools, most humanoid robots struggle with real-life tasks. Simple activities like folding laundry, serving food, or caregiving remain much harder than advertised. The gap between marketing demos and practical dependability is still wide, slowing adoption in homes and industries.
China’s Disruptive Entry: X Square Robot and Wall-OSS
A new player from China, X Square Robot, is taking a bold step with its open-source model, Wall-OSS. This is China’s first foundational model for embodied intelligence and is designed to make robots adaptable in unpredictable real-world environments. Unlike proprietary approaches from Tesla or Boston Dynamics, Wall-OSS will be freely available on platforms like GitHub and Hugging Face.

How Wall-OSS Works
Wall-OSS uses a Shared Attention Mechanism to focus only on relevant cues, improving reaction time and reducing errors. It’s also paired with a Task-Routed Feed-Forward Network (FFN). In most legacy systems, all sensory streams, such as camera vision, spoken instructions, or motor commands, are funneled through a single processing layer, forcing the model to juggle unrelated inputs in one space. The result was bottlenecks, slow adaptation, and a tendency to misprioritize commands.
Wall-OSS addresses this differently. Shared attention allows the model to selectively focus on the most relevant cues in a scene, while the task-routed FFN processes different types of input, vision, language, and motor actions through specialized pathways. For example, visual data is processed along one optimised pathway for object recognition and spatial mapping; linguistic commands are parsed and processed through a separate pathway. Concurrently, motion processing operates independently, accounting for physical constraints and real-time feedback.
This enables robots to understand and act in context, such as following the instruction “Pick up the apple on the table and place it in the bowl” instead of treating “see apple” and “pick” as separate tasks. Routing multimodal inputs effectively mirrors human cognition, where vision/sign, hearing, and motor planning seamlessly integrate. For robots, this translates to faster response times, fewer errors, and better performance in unfamiliar environments.
Another key feature is Chain-of-Thought (CoT) reasoning integrated into its architecture, where the robot plans out multi-step actions before execution. For example, a command like “clean the table” is often treated as a one-off trigger by robots in traditional models. They may forget to wipe it down after picking up tableware. While CoT reasoning gives the ability to simulate multiple steps before moving. It generates an internal plan instead of reacting in isolation. Responding to the command ‘clean the table’ involves recognizing clutter, sorting items, removing dishes, and wiping the surface, all done logically rather than through trial and error.
This, of course, extends beyond simple housekeeping. In industrial or service settings, multi-step planning allows robots to adapt without explicit programming. A warehouse bot could deduce how to stack differently-sized packages without knocking them over, while a healthcare assistant could prepare instruments by following procedural sequences.
Training for Real-World Adaptability
Unlike older systems trained on narrow datasets, Wall-OSS has been trained on billions of Vision-Language-Action samples. These come from real-world robotic logs, generative videos, and synthetic environments with different lighting, textures, and clutter. This broad training gives model resilience beyond controlled labs, and also means that WALL-OSS-powered robots are less likely to falter when faced with unusual household layouts, different object shapes, or sudden changes in context.
The Quanta X2 Robot
To showcase Wall-OSS, X Square Robot has developed the Quanta X2. This robot uses a wheeled base, a 7-degree-of-freedom arm, and a dexterous hand with lifelike gestures. It can handle up to 62 degrees of freedom for natural movements and features rotating clamps for 360° cleaning. The Quanta X2 is designed for use in service industries, households, and industrial settings.
Open-Source vs. Proprietary Futures
While other robotics companies maintain proprietary stacks, X Square Robot is set to release Wall-OSS openly on GitHub and Hugging Face. And if adoption spreads, Wall-OSS may position open frameworks as a credible counterpoint to closed systems.
With roughly US $100 million in funding, X Square Robot is betting that open-source collaboration can solve what proprietary efforts have not: dependable performance in unpredictable environments. For startups, Wall-OSS provides a ready-made brain; for hardware makers, it offers a generalizable intelligence layer. The race is no longer about choreographed performances but about bridging the gap between shallow demos and real-world needs.
Comments