RECENT STORIES:

Addressing digital sovereignty in a data-driven world
Physical AI leads the 3 key trends in the sensor market
EU begins active enforcement of Artificial Intelligence Act
ICANN85 in Mumbai: Strengthening the Single, Interoperable Internet fo...
COSRX’s Viral Peptide Eye Patch Gets a Pink Makeover for Limited...
Shield AI, Republic of Singapore Air Force, and Defence Science and Te...
LOGIN REGISTER
DigiconAsia
  • Features
    • Featured

      What’s next for augmented reality?

      What’s next for augmented reality?

      Wednesday, February 4, 2026, 8:41 AM Asia/Singapore | Features
    • Featured

      How non‑IT startups can plan secure, scalable IT infrastructure

      How non‑IT startups can plan secure, scalable IT infrastructure

      Monday, February 2, 2026, 8:00 PM Asia/Singapore | Features, Newsletter
    • Featured

      India’s e‑governance push must prioritize accountability over automation

      India’s e‑governance push must prioritize accountability over automation

      Thursday, January 29, 2026, 12:04 PM Asia/Singapore | Features
  • News
    • Featured

      Physical AI leads the 3 key trends in the sensor market

      Physical AI leads the 3 key trends in the sensor market

      Friday, February 6, 2026, 12:30 PM Asia/Singapore | News, Newsletter, Tips & Strategies
    • Featured

      EU begins active enforcement of Artificial Intelligence Act

      EU begins active enforcement of Artificial Intelligence Act

      Friday, February 6, 2026, 10:14 AM Asia/Singapore | News, Newsletter
    • Featured

      Maritime passenger terminal unifies operations through real-time data and event-driven architecture

      Maritime passenger terminal unifies operations through real-time data and event-driven architecture

      Wednesday, February 4, 2026, 12:21 PM Asia/Singapore | Case Studies, News, Newsletter
  • Perspectives
  • Tips & Strategies
  • Whitepapers
  • Awards 2023
  • Directory
  • E-Learning

Select Page

News

How psychological tactics can expose refusal limits of LLM: preprint

By DigiconAsia Editors | Monday, September 8, 2025, 5:07 PM Asia/Singapore

How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

  • Gradually escalating requests
  • Invoking respected authorities
  • Constructing context-rich narratives
  • The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Share:

PreviousSurvey of Hong Kong workers explores challenges confronting workers’ financial well-being
NextAduna and SK telink Announce Collaboration to Bring Korea Into the Global Network API Ecosystem

Related Posts

Harmonizing the human touch with human tech for travel recovery

Harmonizing the human touch with human tech for travel recovery

July 13, 2020

What risks do insurance firms fear most in 2025 and beyond?

What risks do insurance firms fear most in 2025 and beyond?

February 6, 2025

Fake news is mild compared to insidious media manipulation

Fake news is mild compared to insidious media manipulation

August 25, 2023

Data security: Are you in pray, hope or believe mode?

Data security: Are you in pray, hope or believe mode?

March 23, 2020

Leave a reply Cancel reply

You must be logged in to post a comment.

Awards Nomination Banner

gamification list

PARTICIPATE NOW

top placement

Whitepapers

  • Achieve Modernization Without the Complexity

    Achieve Modernization Without the Complexity

    Transforming IT infrastructure is crucial …Download Whitepaper
  • 5 Steps to Boost IT Infrastructure Reliability

    5 Steps to Boost IT Infrastructure Reliability

    In today's fast-evolving tech landscape, …Download Whitepaper
  • Simplify Payroll Setup for Your Small Business

    Simplify Payroll Setup for Your Small Business

    In our free guide, "How …Download Whitepaper
  • Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Download Whitepaper

Middle Placement

Case Studies

  • Maritime passenger terminal unifies operations through real-time data and event-driven architecture

    Maritime passenger terminal unifies operations through real-time data and event-driven architecture

    The Singapore Cruise Centre is …Read More
  • US hotel group streamlines operations, unifies management across multiple properties

    US hotel group streamlines operations, unifies management across multiple properties

    CN Hotels deploys centralized platform …Read More
  • When 24/7 engagement means so much to students: University of Malaysia Nottingham

    When 24/7 engagement means so much to students: University of Malaysia Nottingham

    That is what prompted the …Read More
  • Harnessing the data lakehouse and AI to revolutionize customer experience

    Harnessing the data lakehouse and AI to revolutionize customer experience

    UOB achieved 99% cash availability …Read More

Bottom Sidebar

Other News

  • ICANN85 in Mumbai: Strengthening the Single, Interoperable Internet for All

    February 5, 2026
    NEW DELHI, Feb. 5, 2026 …Read More »
  • COSRX’s Viral Peptide Eye Patch Gets a Pink Makeover for Limited Valentine’s Skincare Drop Now on Amazon

    February 5, 2026
    NEW YORK, Feb. 5, 2026 …Read More »
  • Shield AI, Republic of Singapore Air Force, and Defence Science and Technology Agency Expand Partnership to Progressively Field Autonomy Capabilities

    February 5, 2026
    SINGAPORE, Feb. 5, 2026 /PRNewswire/ …Read More »
  • Jin Medical CEO Fireside Chat

    February 5, 2026
    CHANGZHOU, China, Feb. 5, 2026 …Read More »
  • Only Small Group of Banks Are Turning AI into Revenue, New Research Finds

    February 5, 2026
    SINGAPORE, Feb. 5, 2026 /PRNewswire/ …Read More »
  • Our Brands
  • CybersecAsia
  • MartechAsia
  • Home
  • About Us
  • Contact Us
  • Sitemap
  • Privacy & Cookies
  • Terms of Use
  • Advertising & Reprint Policy
  • Media Kit
  • Subscribe
  • Manage Subscriptions
  • Newsletter

Copyright © 2026 DigiconAsia All Rights Reserved.