How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

Gradually escalating requests
Invoking respected authorities
Constructing context-rich narratives
The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Featured

Where AI will take us in 2026

Featured

The future of data centers in Asia Pacific

Featured

Where data sparks innovation, trust powers decisions – and impact follows

Featured

When robots can assemble their own kind, work will be optional

Featured

Going green all the way to Cyberjaya: Labuan Reinsurance’s data center relocation

Featured

AI to reshape workforce dynamics with global job displacements, new-role creation

Leave a reply Cancel reply

Awards Nomination Banner

gamification list

top placement

Whitepapers

Achieve Modernization Without the Complexity

5 Steps to Boost IT Infrastructure Reliability

Simplify Payroll Setup for Your Small Business

Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

Middle Placement

Case Studies

Going green all the way to Cyberjaya: Labuan Reinsurance’s data center relocation

When traditional intelligent business automation hits a roadblock, try AI agents

CTBC defines future of transition finance with Evercomm solution

Emirates Flight Catering unifies global operations with AI-driven data governance and cloud collaboration

Bottom Sidebar

Other News

6K ADDITIVE SECURES A$48 MILLION THROUGH INITIAL PUBLIC OFFERING ON THE AUSTRALIAN STOCK EXCHANGE

Mitrade CFD Broker Caps 2025 With New Licence, Expanded Market Access and Record 16 Awards Amid a 20% Expansion in Its User Base

Budget Direct Sweeps Money Magazine Awards, Setting Historic Eight-Year “Best of the Best” Record

Korea’s Startups Poised for a Landmark Season as COMEUP 2025 and CES 2026 Draw Near

ONERugged 8-Inch Windows Rugged Tablet M82A Unveiled with a New Level of Power and Portability

Featured

Where AI will take us in 2026

Featured

The future of data centers in Asia Pacific

Featured

Where data sparks innovation, trust powers decisions – and impact follows

Featured

When robots can assemble their own kind, work will be optional

Featured

Going green all the way to Cyberjaya: Labuan Reinsurance’s data center relocation

Featured

AI to reshape workforce dynamics with global job displacements, new-role creation

How psychological tactics can expose refusal limits of LLM: preprint

Related Posts

Leave a reply Cancel reply

Awards Nomination Banner

gamification list

top placement

Whitepapers

Middle Placement

Case Studies

Bottom Sidebar

Other News