As questions grow around the sustainability of demand for building massive AI infrastructure, desktop AI models are attracting intense interest.
A November 2025 study from Stanford University study is reshaping assumptions about generative AI, by showing that compact models running on domestic compute machines can perform on par with large cloud-based systems across most tasks, while using significantly less power.
The findings, highlighted in Reuters, introduce the idea of “intelligence per watt” and suggest that the economic case for ever-larger AI systems may be less certain than widely believed.
Researchers evaluated more than 20 local language models with up to 20bn active parameters. These models were tested on eight different hardware accelerators using one million real-world, single-turn queries involving chat and reasoning. Results showed local systems handled 88.7% of queries correctly, while surpassing 90% accuracy in creative tasks, and maintaining strong performance in areas such as sales, management, and entertainment.
On complex reasoning tasks performance gaps narrowed further. Smaller models matched large-scale systems in roughly half of these challenging scenarios, a sharp improvement from just 8% two years earlier, according to interpretations of the test data. Over the same period, “intelligence per watt” rose 5.3-fold, driven by a 3.1x improvement in model design and a 1.7x boost from hardware advancements. These efficiency gains translate into meaningful cost reductions.
The researchers found that using a routing system that directs queries to local models when viable, can reduce energy consumption by 80.4%, and computing costs by 73.8%, compared with relying solely on cloud-based inference. Even with a less accurate routing system operating at 80% effectiveness, energy savings can still exceed 60%.
As questions grow around the sustainability of demand for large-scale AI infrastructure, particularly for firms like Nvidia that supply GPUs to data centers, the Stanford research offers possible alternatives using smaller, cheaper models. It also reflects a broader shift in the industry, where newer local models now deliver greater efficiency than older systems running on specialized infrastructure. Even Nvidia itself has acknowledged this direction, stating in a 2026 paper that smaller models are better suited for agentic AI due to their efficiency and cost advantages.