RECENT STORIES:

Addressing digital sovereignty in a data-driven world
New Survey from Harvard Business Review Analytic Services Finds AI Ado...
FinVolution Group Files 2025 Annual Report on Form 20-F
Costa Rica Steps Up Competition for Asian Investment with New Office i...
Neutrinos Named a Leader in Everest Group’s 2026 Insurance-speci...
GreenTree Hospitality Group Ltd. Reports Fourth Quarter and Fiscal Yea...
LOGIN REGISTER
DigiconAsia
  • Features
    • Featured

      The double-edged knife that boomeranged: Warnings about AI comes alive in one executive’s ruin

      The double-edged knife that boomeranged: Warnings about AI comes alive in one executive’s ruin

      Monday, April 27, 2026, 10:56 AM Asia/Singapore | Features, Newsletter
    • Featured

      Is educational technology in Asian primary schools evolving with the AI impact?

      Is educational technology in Asian primary schools evolving with the AI impact?

      Thursday, March 26, 2026, 2:02 PM Asia/Singapore | Features
    • Featured

      The rise of situational intelligence

      The rise of situational intelligence

      Thursday, March 19, 2026, 10:55 AM Asia/Singapore | Features
  • News
    • Featured

      Liverpool FC to deliver more personalized, real-time digital fan experiences with AI

      Liverpool FC to deliver more personalized, real-time digital fan experiences with AI

      Wednesday, April 29, 2026, 10:39 AM Asia/Singapore | Case Studies, News
    • Featured

      How have AI agents in four ASEAN economies amid uneven training and evolving roles?

      How have AI agents in four ASEAN economies amid uneven training and evolving roles?

      Wednesday, April 29, 2026, 10:36 AM Asia/Singapore | News
    • Featured

      Investigative newspaper finds generative AI deepfakes are overwhelming detection systems

      Investigative newspaper finds generative AI deepfakes are overwhelming detection systems

      Monday, April 27, 2026, 3:00 PM Asia/Singapore | News
  • Perspectives
  • Tips & Strategies
  • Whitepapers
  • Directory
  • E-Learning

Select Page

News

How psychological tactics can expose refusal limits of LLM: preprint

By DigiconAsia Editors | Monday, September 8, 2025, 5:07 PM Asia/Singapore

How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

  • Gradually escalating requests
  • Invoking respected authorities
  • Constructing context-rich narratives
  • The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Share:

PreviousSurvey of Hong Kong workers explores challenges confronting workers’ financial well-being
NextAduna and SK telink Announce Collaboration to Bring Korea Into the Global Network API Ecosystem

Related Posts

What a regional six-country survey of workplace mental health trends revealed

What a regional six-country survey of workplace mental health trends revealed

March 28, 2025

Japan implements state-of-the-art cargo screening systems to thwart smugglers

Japan implements state-of-the-art cargo screening systems to thwart smugglers

October 22, 2021

In the wake of the pandemic, rebuild better and co-create together with data

In the wake of the pandemic, rebuild better and co-create together with data

August 16, 2021

How can the region leverage technology to boost employee upskilling and performance?

How can the region leverage technology to boost employee upskilling and performance?

January 17, 2025

Leave a reply Cancel reply

You must be logged in to post a comment.

Awards Nomination Banner

gamification list

PARTICIPATE NOW

top placement

Whitepapers

  • Achieve Modernization Without the Complexity

    Achieve Modernization Without the Complexity

    Transforming IT infrastructure is crucial …Download Whitepaper
  • 5 Steps to Boost IT Infrastructure Reliability

    5 Steps to Boost IT Infrastructure Reliability

    In today's fast-evolving tech landscape, …Download Whitepaper
  • Simplify Payroll Setup for Your Small Business

    Simplify Payroll Setup for Your Small Business

    In our free guide, "How …Download Whitepaper
  • Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Download Whitepaper

Middle Placement

Case Studies

  • Liverpool FC to deliver more personalized, real-time digital fan experiences with AI

    Liverpool FC to deliver more personalized, real-time digital fan experiences with AI

    The football club will deepen …Read More
  • Balancing brand heritage and modern service with AI-powered customer experience

    Balancing brand heritage and modern service with AI-powered customer experience

    Balancing brand heritage and modern …Read More
  • Overhauling IT boosts business sustainability, efficiency amid motorsport carbon pressures: McLaren

    Overhauling IT boosts business sustainability, efficiency amid motorsport carbon pressures: McLaren

    The firm’s global IT team …Read More
  • Nokia integrates all-flash data infrastructure into telco cloud for network modernization

    Nokia integrates all-flash data infrastructure into telco cloud for network modernization

    Its December 2025 upgrade supports …Read More

Bottom Sidebar

Other News

  • New Survey from Harvard Business Review Analytic Services Finds AI Adoption Remains High, Yet Value May Lag Without Modernisation and Workflow Integration

    April 30, 2026
    A critical AI success gap …Read More »
  • FinVolution Group Files 2025 Annual Report on Form 20-F

    April 30, 2026
    SHANGHAI, April 30, 2026 /PRNewswire/ …Read More »
  • Costa Rica Steps Up Competition for Asian Investment with New Office in Singapore

    April 30, 2026
    New presence aims to diversify …Read More »
  • Neutrinos Named a Leader in Everest Group’s 2026 Insurance-specific IDP PEAK Matrix® Assessment

    April 29, 2026
    Recognition reflects the industry shift …Read More »
  • GreenTree Hospitality Group Ltd. Reports Fourth Quarter and Fiscal Year 2025 Financial Results

    April 29, 2026
    Total revenues for the fourth …Read More »
  • Our Brands
  • CybersecAsia
  • MartechAsia
  • Home
  • About Us
  • Contact Us
  • Sitemap
  • Privacy & Cookies
  • Terms of Use
  • Advertising & Reprint Policy
  • Media Kit
  • Subscribe
  • Manage Subscriptions
  • Newsletter

Copyright © 2026 DigiconAsia All Rights Reserved.