Unimore logo AImageLab

LLMs as NAO Robot 3D Motion Planners

Abstract: In this study, we demonstrate the capabilities of state-of-the-art Large Language Models (LLMs) in teaching social robots to perform specific actions within a 3D environment. Specifically, we introduce the use of LLMs to generate sequences of 3D joint angles - in both zero-shot and one-shot prompting - that a humanoid robot must follow to perform a given action. This work is driven by the growing demand for intuitive interactions with social robots: indeed, LLMs could empower non-expert users to operate and benefit from robotic systems effectively. Additionally, this method leverages the possibility to generate synthetic data without effort, enabling privacy-focused use cases. To evaluate the output quality of seven different LLMs, we conducted a blind user study to compare the pose sequences. Participants were shown videos of the well-known NAO robot performing the generated actions and were asked to identify the intended action and choose the best match with the original instruction from a collection of candidates created by different LLMs. The results highlight that the majority of LLMs are indeed capable of planning correct and complete recognizable actions, showing a novel perspective of how AI can be applied to social robotics.


Citation:

Catalini, Riccardo; Salici, Giacomo; Biagi, Federico; Borghi, Guido; Biagiotti, Luigi; Vezzani, Roberto "LLMs as NAO Robot 3D Motion Planners" 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Honolulu (United States), 19/10/2025, 2025

 not available

Paper download:

  • Author version: