Embodied intelligence refers to intelligent agents that have a physical or virtual body and interact with their environment through continuous sensing, decision-making, and action. Unlike traditional AI that processes static data, embodied intelligence emphasizes the dynamic loop where perception guides action, and action changes what is perceived. This interaction involves three key components: intelligence (the computational brain), embodiment (the physical or simulated body), and environment (the external world with its objects and dynamics).
Recent advances have improved each of these components. Intelligence now often involves large-scale machine learning models that enhance perception and decision-making. Embodiment has diversified with robots ranging from precise manipulators to agile quadrupeds and humanoids, with body design playing a crucial role in what an agent can sense and do. The environment is modeled with sophisticated 3D and semantic understanding, supported by high-fidelity simulators that help bridge the gap between virtual training and real-world deployment.
The core capabilities of embodied intelligence are embodied perception, decision-making, and action. Perception is active and task-driven, combining multiple sensory inputs to build a rich understanding of the world. Decision-making involves selecting goals and actions under real-world constraints, often using hierarchical planning or end-to-end learning approaches. Action translates decisions into coordinated movements, leveraging imitation and reinforcement learning to improve skills and adapt to new situations.
Looking ahead, the field focuses on several priorities: developing end-to-end learning systems that map raw sensory data directly to actions; integrating multimodal sensing like vision and touch for more robust perception; designing adaptive morphologies that combine rigidity and softness for dexterity and resilience; and improving real-world generalization so agents can adapt to complex, changing environments.
Embodied intelligence aims to create general-purpose, self-directed agents that continuously learn and adapt by tightly coupling perception, decision, and action within their bodies and environments, moving AI from static models to active, real-world actors.