How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

Gradually escalating requests
Invoking respected authorities
Constructing context-rich narratives
The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Featured

Sovereign AI – a competitive advantage

Featured

Deployment outpacing validation in digital experience

Featured

Bridging the gap from AI prototype to production

Featured

Pope Leo XIV says AI cannot be ‘morally neutral’, urges accountability, responsible oversight

Featured

Quantum security milestone: ISO updates encryption standard to include quantum-resistant algorithms

Featured

UN approves first global rules for fully autonomous driving systems

Leave a reply Cancel reply

Awards Nomination Banner

gamification list

top placement

Whitepapers

Achieve Modernization Without the Complexity

5 Steps to Boost IT Infrastructure Reliability

Simplify Payroll Setup for Your Small Business

Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

Middle Placement

Case Studies

The 48-hour lifeline: How the IRC rewrote the rules for crisis care

CALB upgrades data platform to support analytics, security, and battery lifecycle tracking

How a Vietnamese D2C retailer built its own secure digital infrastructure

Liverpool FC to deliver more personalized, real-time digital fan experiences with AI

Bottom Sidebar

Other News

BMO to Acquire Australia-Based Capital Markets Business of Euroz Hartleys Group, Strengthening Global Metals & Mining Leadership

GLOBAL MITOCHONDRIAL HEALTH SUMMIT 2026 POSITIONS MITOCHONDRIAL HEALTH AS THE FOUNDATION OF HEALTHSPAN

BRAHMA AI, Hakuhodo Technologies Partner to Scale High-Fidelity Digital Humans Across Japan and APAC

SwitchBot Expands Matter-Enabled Lighting Portfolio with RGBICWW Ceiling Light

OceanBase Unveils AI Database: A New Product Portfolio Unifying Multimodal Data, Real-Time Analytics, and AI Agent Workloads

Featured

Sovereign AI – a competitive advantage

Featured

Deployment outpacing validation in digital experience

Featured

Bridging the gap from AI prototype to production

Featured

Pope Leo XIV says AI cannot be ‘morally neutral’, urges accountability, responsible oversight

Featured

Quantum security milestone: ISO updates encryption standard to include quantum-resistant algorithms

Featured

UN approves first global rules for fully autonomous driving systems

How psychological tactics can expose refusal limits of LLM: preprint

Related Posts

Leave a reply Cancel reply

Awards Nomination Banner

gamification list

top placement

Whitepapers

Middle Placement

Case Studies

Bottom Sidebar

Other News