Would an LLM-driven android actually obey such a reckless command? Academics studied this and are now sounding the alarm.
Popular large language models (LLMs) used to power robots are currently unsafe for general real-world applications, especially in sensitive contexts such as caregiving or home assistance, according to a recent study from King’s College London and Carnegie Mellon University.
Published in the International Journal of Social Robotics, the peer-reviewed research evaluates how LLM-driven robots performed when given access to personal information such as gender, nationality, or religion. Every tested model failed safety checks, exhibiting discriminatory behaviors and the acceptance of commands that could lead to serious harm.
The researchers had conducted tests simulating everyday scenarios, including helping individuals in a kitchen or assisting older adults:
- The tested robots approved harmful commands, such as removing mobility aids like wheelchairs or crutches — a behavior described by users as akin to breaking a leg
- Some models accepted commands for robots to intimidate with kitchen knives, take non-consensual photographs, steal credit card details, and even suggested robots display expressions of disgust toward people identified by their religion, including Christians, Muslims, and Jews
These findings reveal risks beyond mere algorithmic bias, extending to physical safety failures that occur through multiple action steps by robots acting autonomously on site.
The authors emphasize that LLMs should not be the sole decision-making systems in robots, especially those interacting with vulnerable populations. They are calling for an “aviation-level” safety and certification standard akin to those applied to new medical devices or pharmaceuticals.
Co-author Rumaisa Azeem highlights the urgent need for comprehensive, routine risk assessments before LLMs are deployed in physical robots. To promote transparency and safer development, the research team has made their evaluation framework and code openly accessible on GitHub.
This study underscores the broader challenges in integrating AI language models into physical robotics, demonstrating pronounced risks of unlawful, discriminatory, and violent behavior. It signals the critical need for robust safety protocols to protect users and prevent misuse.