RECENT STORIES:

Addressing digital sovereignty in a data-driven world
How psychological tactics can expose refusal limits of LLM: preprint
Survey of Hong Kong workers explores challenges confronting workers’ f...
Autozi Internet Technology (Global) Ltd. Reports First Half Fiscal Yea...
PINTEC Announces Private Placement of Class A Ordinary Shares in Excha...
CGS International Securities Singapore Democratizes Access to Capital ...
LOGIN REGISTER
DigiconAsia
  • Features
    • Featured

      Overcoming manufacturing challenges with the smart factory of the future

      Overcoming manufacturing challenges with the smart factory of the future

      Wednesday, September 3, 2025, 2:31 PM Asia/Singapore | Features
    • Featured

      From generative AI to agentic AI

      From generative AI to agentic AI

      Friday, August 29, 2025, 10:10 AM Asia/Singapore | Features
    • Featured

      Resilience the true benchmark for smart infrastructure

      Resilience the true benchmark for smart infrastructure

      Wednesday, August 27, 2025, 8:26 PM Asia/Singapore | Features, Smart Cities
  • News
    • Featured

      How psychological tactics can expose refusal limits of LLM: preprint

      How psychological tactics can expose refusal limits of LLM: preprint

      Monday, September 8, 2025, 5:07 PM Asia/Singapore | News, Newsletter
    • Featured

      Survey of Hong Kong workers explores challenges confronting workers’ financial well-being

      Survey of Hong Kong workers explores challenges confronting workers’ financial well-being

      Monday, September 8, 2025, 3:10 PM Asia/Singapore | News, Newsletter
    • Featured

      When online safety becomes a meta of concern for AI-generated celebrity-parody avatars

      When online safety becomes a meta of concern for AI-generated celebrity-parody avatars

      Thursday, September 4, 2025, 5:12 PM Asia/Singapore | News, Newsletter
  • Perspectives
  • Tips & Strategies
  • Whitepapers
  • Awards 2023
  • Directory
  • E-Learning

Select Page

News

How psychological tactics can expose refusal limits of LLM: preprint

By DigiconAsia Editors | Monday, September 8, 2025, 5:07 PM Asia/Singapore

How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

  • Gradually escalating requests
  • Invoking respected authorities
  • Constructing context-rich narratives
  • The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Share:

PreviousSurvey of Hong Kong workers explores challenges confronting workers’ financial well-being

Related Posts

Post-pandemic recovery challenges

Post-pandemic recovery challenges

December 20, 2021

India develops AI-based algorithm for fast detection of COVID-19 via X-rays

India develops AI-based algorithm for fast detection of COVID-19 via X-rays

May 24, 2021

Fujitsu scales enterprise-wide automation with AI for business resilience

Fujitsu scales enterprise-wide automation with AI for business resilience

June 4, 2024

Speeding up app development without technical debt: GenAI is the key

Speeding up app development without technical debt: GenAI is the key

October 24, 2024

Leave a reply Cancel reply

You must be logged in to post a comment.

Awards Nomination Banner

gamification list

PARTICIPATE NOW

top placement

Whitepapers

  • Achieve Modernization Without the Complexity

    Achieve Modernization Without the Complexity

    Transforming IT infrastructure is crucial …Download Whitepaper
  • 5 Steps to Boost IT Infrastructure Reliability

    5 Steps to Boost IT Infrastructure Reliability

    In today's fast-evolving tech landscape, …Download Whitepaper
  • Simplify Payroll Setup for Your Small Business

    Simplify Payroll Setup for Your Small Business

    In our free guide, "How …Download Whitepaper
  • Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Download Whitepaper

Middle Placement

Case Studies

  • AXS modernizes legacy systems to prepare for regional expansion

    AXS modernizes legacy systems to prepare for regional expansion

    The 20-year-old payment service network …Read More
  • Gill Capital redefines retail product discovery with Google-quality search

    Gill Capital redefines retail product discovery with Google-quality search

    Harnessing generative AI, agentic AI …Read More
  • Indonesia’s largest private hospital network modernizes operations with cloud migration  

    Indonesia’s largest private hospital network modernizes operations with cloud migration  

    PT Siloam International Hospitals improves …Read More
  • Maybank accelerates digital transformation with AI-driven innovation

    Maybank accelerates digital transformation with AI-driven innovation

    Maybank has entered into a strategic …Read More

Bottom Sidebar

Other News

  • Autozi Internet Technology (Global) Ltd. Reports First Half Fiscal Year 2025 Financial Results

    September 6, 2025
    BEIJING, Sept. 6, 2025 /PRNewswire/ …Read More »
  • PINTEC Announces Private Placement of Class A Ordinary Shares in Exchange of Ordinary Shares of ZIITECH PTY LTD

    September 6, 2025
    BEIJING, Sept. 6, 2025 /PRNewswire/ …Read More »
  • CGS International Securities Singapore Democratizes Access to Capital Markets with ViewTrade’s Solutions

    September 5, 2025
    Powering a seamless, user-friendly trading …Read More »
  • TDH Holdings Announces its Annual General Meeting of Shareholders will be Held on October 29, 2025

    September 5, 2025
    BEIJING, Sept. 5, 2025 /PRNewswire/ …Read More »
  • AI and NEVs Take Center Stage as World Smart Industry Expo Opens in Chongqing

    September 5, 2025
    CHONGQING, China, Sept. 5, 2025 …Read More »
  • Our Brands
  • CybersecAsia
  • MartechAsia
  • Home
  • About Us
  • Contact Us
  • Sitemap
  • Privacy & Cookies
  • Terms of Use
  • Advertising & Reprint Policy
  • Media Kit
  • Subscribe
  • Manage Subscriptions
  • Newsletter

Copyright © 2025 DigiconAsia All Rights Reserved.