RECENT STORIES:

Addressing digital sovereignty in a data-driven world
Can poor social media content cause irreversible GenAI brain rot?
Urgent cybersecurity threats linked to quantum computing every IT deci...
FrankieOne and MATTR unite to strengthen digital identity assurance wi...
Nature Wood Group Limited Announces Change in Controlling Shareholder ...
Charming Medical Limited Announces Closing of Initial Public Offering
LOGIN REGISTER
DigiconAsia
  • Features
    • Featured

      Leveraging CRM platform for AI-powered financial inclusion in Asia

      Leveraging CRM platform for AI-powered financial inclusion in Asia

      Friday, October 17, 2025, 2:34 PM Asia/Singapore | Features
    • Featured

      From smart cities to strained cities?

      From smart cities to strained cities?

      Thursday, October 16, 2025, 1:43 PM Asia/Singapore | Data Centers & Interconnectivity, Features
    • Featured

      Storage for AI, AI for storage

      Storage for AI, AI for storage

      Tuesday, October 7, 2025, 3:57 PM Asia/Singapore | Features
  • News
    • Featured

      Can poor social media content cause irreversible GenAI brain rot?

      Can poor social media content cause irreversible GenAI brain rot?

      Thursday, October 23, 2025, 4:06 PM Asia/Singapore | News, Newsletter
    • Featured

      Gaming industry hit hard by layoffs and closures amid AI gold rush

      Gaming industry hit hard by layoffs and closures amid AI gold rush

      Wednesday, October 22, 2025, 4:36 PM Asia/Singapore | News
    • Featured

      Botched Windows 11 update disrupts global Windows 11 development and testing environments

      Botched Windows 11 update disrupts global Windows 11 development and testing environments

      Tuesday, October 21, 2025, 2:49 PM Asia/Singapore | News, Newsletter
  • Perspectives
  • Tips & Strategies
  • Whitepapers
  • Awards 2023
  • Directory
  • E-Learning

Select Page

News

How psychological tactics can expose refusal limits of LLM: preprint

By DigiconAsia Editors | Monday, September 8, 2025, 5:07 PM Asia/Singapore

How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

  • Gradually escalating requests
  • Invoking respected authorities
  • Constructing context-rich narratives
  • The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Share:

PreviousSurvey of Hong Kong workers explores challenges confronting workers’ financial well-being
NextAduna and SK telink Announce Collaboration to Bring Korea Into the Global Network API Ecosystem

Related Posts

Converging 5G with edge computing: what we can expect

Converging 5G with edge computing: what we can expect

August 25, 2022

Contact centers: surviving and thriving through the pandemic

Contact centers: surviving and thriving through the pandemic

July 23, 2020

Companies need sustainability strategy as they march toward AI

Companies need sustainability strategy as they march toward AI

October 25, 2023

For Porsche Motorsport, data intelligence is key to race track success

For Porsche Motorsport, data intelligence is key to race track success

June 7, 2024

Leave a reply Cancel reply

You must be logged in to post a comment.

Awards Nomination Banner

gamification list

PARTICIPATE NOW

top placement

Whitepapers

  • Achieve Modernization Without the Complexity

    Achieve Modernization Without the Complexity

    Transforming IT infrastructure is crucial …Download Whitepaper
  • 5 Steps to Boost IT Infrastructure Reliability

    5 Steps to Boost IT Infrastructure Reliability

    In today's fast-evolving tech landscape, …Download Whitepaper
  • Simplify Payroll Setup for Your Small Business

    Simplify Payroll Setup for Your Small Business

    In our free guide, "How …Download Whitepaper
  • Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Download Whitepaper

Middle Placement

Case Studies

  • Globe Business reduces overall customer service workload by 34% through digitalization

    Globe Business reduces overall customer service workload by 34% through digitalization

    This was the result of …Read More
  • HOSTWAY gains 73% operational efficiency for private cloud operations  

    HOSTWAY gains 73% operational efficiency for private cloud operations  

    With NetApp storage solutions, the …Read More
  • Designing unmanned aerial vehicles for safety and speed

    Designing unmanned aerial vehicles for safety and speed

    SwissDrones uses Autodesk Fusion to …Read More
  • LVMH redefines payments in the global luxury sector with Adyen

    LVMH redefines payments in the global luxury sector with Adyen

    Frictionless payment solutions for seamless …Read More

Bottom Sidebar

Other News

  • FrankieOne and MATTR unite to strengthen digital identity assurance with verifiable, government-issued credentials

    October 23, 2025
    Embed privacy-preserving credential checks alongside …Read More »
  • Nature Wood Group Limited Announces Change in Controlling Shareholder Following Completion of Private Share Transfer

    October 23, 2025
    MACAU, Oct. 23, 2025 /PRNewswire/ …Read More »
  • Charming Medical Limited Announces Closing of Initial Public Offering

    October 23, 2025
    HONG KONG, Oct. 23, 2025 …Read More »
  • Largest Milestone Payment Secured: Biokin, a Rising Chinese Pharmaceutical MNC

    October 22, 2025
    BEIJING, Oct. 22, 2025 /PRNewswire/ …Read More »
  • ILO DIRECTOR-GENERAL COMMENDS STEVEN SIM’S LEADERSHIP IN ADVANCING ASEAN’S SKILLS AGENDA AT GSF 2025

    October 22, 2025
    KUALA LUMPUR, Malaysia, Oct. 22, …Read More »
  • Our Brands
  • CybersecAsia
  • MartechAsia
  • Home
  • About Us
  • Contact Us
  • Sitemap
  • Privacy & Cookies
  • Terms of Use
  • Advertising & Reprint Policy
  • Media Kit
  • Subscribe
  • Manage Subscriptions
  • Newsletter

Copyright © 2025 DigiconAsia All Rights Reserved.