RECENT STORIES:

Addressing digital sovereignty in a data-driven world
DRONTECH ASIA 2025 OPENS IN KUALA LUMPUR: MALAYSIA SETS THE PACE FOR D...
GreenTree Hospitality Group Ltd. Reports First Half 2025 Financial Res...
British Director Michael Lachmann’s Documentary Series “Sp...
APTO Releases Training Dataset to Enhance the Mathematical Reasoning C...
Deals Exceeding RMB 2 Billion Follow SANY’s Showcase of Integrat...
LOGIN REGISTER
DigiconAsia
  • Features
    • Featured

      Designing unmanned aerial vehicles for safety and speed

      Designing unmanned aerial vehicles for safety and speed

      Tuesday, September 30, 2025, 4:20 PM Asia/Singapore | Case Studies, Features
    • Featured

      Addressing AI-bias governance: From technical issue to strategic board-level concern

      Addressing AI-bias governance: From technical issue to strategic board-level concern

      Monday, September 29, 2025, 12:38 PM Asia/Singapore | Features, Newsletter
    • Featured

      Clearing away the shadows of AI

      Clearing away the shadows of AI

      Tuesday, September 23, 2025, 3:29 PM Asia/Singapore | Features
  • News
    • Featured

      This is the irony of the “AI productivity promise” when not managed responsibly

      This is the irony of the “AI productivity promise” when not managed responsibly

      Saturday, September 27, 2025, 12:37 PM Asia/Singapore | News, Newsletter
    • Featured

      EU finalizes financial data rules excluding major US tech firms for digital sovereignty

      EU finalizes financial data rules excluding major US tech firms for digital sovereignty

      Tuesday, September 23, 2025, 4:57 PM Asia/Singapore | News, Newsletter
    • Featured

      South Sulawesi partners Vietnamese firm on mega solar projects to boost renewable energy capacity

      South Sulawesi partners Vietnamese firm on mega solar projects to boost renewable energy capacity

      Tuesday, September 23, 2025, 2:44 PM Asia/Singapore | News, Newsletter, Smart Cities
  • Perspectives
  • Tips & Strategies
  • Whitepapers
  • Awards 2023
  • Directory
  • E-Learning

Select Page

News

How psychological tactics can expose refusal limits of LLM: preprint

By DigiconAsia Editors | Monday, September 8, 2025, 5:07 PM Asia/Singapore

How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

  • Gradually escalating requests
  • Invoking respected authorities
  • Constructing context-rich narratives
  • The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Share:

PreviousSurvey of Hong Kong workers explores challenges confronting workers’ financial well-being
NextAduna and SK telink Announce Collaboration to Bring Korea Into the Global Network API Ecosystem

Related Posts

Cloud strategies for the challenging pandemic new reality

Cloud strategies for the challenging pandemic new reality

April 19, 2021

Kidney dialysis foundation in Singapore transforms the tin cans used for fundraising

Kidney dialysis foundation in Singapore transforms the tin cans used for fundraising

July 17, 2023

Working from home? Watch your bad work habits!

Working from home? Watch your bad work habits!

May 15, 2020

Is APAC’s Multi-Tenant Data Center sector ready for the IoT and 5G revolution?

Is APAC’s Multi-Tenant Data Center sector ready for the IoT and 5G revolution?

July 13, 2021

Leave a reply Cancel reply

You must be logged in to post a comment.

Awards Nomination Banner

gamification list

PARTICIPATE NOW

top placement

Whitepapers

  • Achieve Modernization Without the Complexity

    Achieve Modernization Without the Complexity

    Transforming IT infrastructure is crucial …Download Whitepaper
  • 5 Steps to Boost IT Infrastructure Reliability

    5 Steps to Boost IT Infrastructure Reliability

    In today's fast-evolving tech landscape, …Download Whitepaper
  • Simplify Payroll Setup for Your Small Business

    Simplify Payroll Setup for Your Small Business

    In our free guide, "How …Download Whitepaper
  • Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Download Whitepaper

Middle Placement

Case Studies

  • Designing unmanned aerial vehicles for safety and speed

    Designing unmanned aerial vehicles for safety and speed

    SwissDrones uses Autodesk Fusion to …Read More
  • LVMH redefines payments in the global luxury sector with Adyen

    LVMH redefines payments in the global luxury sector with Adyen

    Frictionless payment solutions for seamless …Read More
  • Forget QR codes, Alipay is betting on you to tap its Tap!

    Forget QR codes, Alipay is betting on you to tap its Tap!

    When 80% of existing users/merchants …Read More
  • AXS modernizes legacy systems to prepare for regional expansion

    AXS modernizes legacy systems to prepare for regional expansion

    The 20-year-old payment service network …Read More

Bottom Sidebar

Other News

  • DRONTECH ASIA 2025 OPENS IN KUALA LUMPUR: MALAYSIA SETS THE PACE FOR DRONE AND ADVANCED AIR MOBILITY IN SOUTHEAST ASIA

    October 1, 2025
    The region’s premier drone and …Read More »
  • GreenTree Hospitality Group Ltd. Reports First Half 2025 Financial Results

    October 1, 2025
    Total revenues decreased by 14.2% …Read More »
  • British Director Michael Lachmann’s Documentary Series “Spacetime Capsule” Premieres in China

    October 1, 2025
    BEIJING, Oct. 1, 2025 /PRNewswire/ …Read More »
  • APTO Releases Training Dataset to Enhance the Mathematical Reasoning Capabilities of Large Language Models (LLMs)

    October 1, 2025
    TOKYO, Oct. 1, 2025 /PRNewswire/ …Read More »
  • Deals Exceeding RMB 2 Billion Follow SANY’s Showcase of Integrated Green Solutions at Mining Indonesia 2025

    October 1, 2025
    BEIJING, Oct. 1, 2025 /PRNewswire/ …Read More »
  • Our Brands
  • CybersecAsia
  • MartechAsia
  • Home
  • About Us
  • Contact Us
  • Sitemap
  • Privacy & Cookies
  • Terms of Use
  • Advertising & Reprint Policy
  • Media Kit
  • Subscribe
  • Manage Subscriptions
  • Newsletter

Copyright © 2025 DigiconAsia All Rights Reserved.