RECENT STORIES:

Addressing digital sovereignty in a data-driven world
“GenAI bot, take my mom out for a head-spinning wheelchair joyride!”...
Laronix Secures $3.2 Million Grant from the Australian Government to B...
Starwood Capital Group, Doma Infrastructure Group and Telstra InfraCo ...
Straits Interactive, in collaboration with Golden Gate University, Edu...
New Sandvik report reveals a golden opportunity to attract engineers t...
LOGIN REGISTER
DigiconAsia
  • Features
    • Featured

      How AI-driven discovery and social commerce are reshaping Singles Day 2025

      How AI-driven discovery and social commerce are reshaping Singles Day 2025

      Tuesday, November 11, 2025, 8:32 AM Asia/Singapore | e-Commerce, Features
    • Featured

      How useful is synthetic research and synthetic data?

      How useful is synthetic research and synthetic data?

      Wednesday, November 5, 2025, 2:52 PM Asia/Singapore | Features, Newsletter
    • Featured

      Leveraging CRM platform for AI-powered financial inclusion in Asia

      Leveraging CRM platform for AI-powered financial inclusion in Asia

      Friday, October 17, 2025, 2:34 PM Asia/Singapore | Features
  • News
    • Featured

      “GenAI bot, take my mom out for a head-spinning wheelchair joyride!”

      “GenAI bot, take my mom out for a head-spinning wheelchair joyride!”

      Thursday, November 13, 2025, 11:25 AM Asia/Singapore | News, Newsletter
    • Featured

      Respected global news agency caught in multiple editorial scandals, triggering leadership resignations

      Respected global news agency caught in multiple editorial scandals, triggering leadership resignations

      Wednesday, November 12, 2025, 6:45 AM Asia/Singapore | News, Newsletter
    • Featured

      Turns out your fingers are better at treasure hunting than you thought

      Turns out your fingers are better at treasure hunting than you thought

      Tuesday, November 11, 2025, 7:50 AM Asia/Singapore | News, Newsletter
  • Perspectives
  • Tips & Strategies
  • Whitepapers
  • Awards 2023
  • Directory
  • E-Learning

Select Page

News

How psychological tactics can expose refusal limits of LLM: preprint

By DigiconAsia Editors | Monday, September 8, 2025, 5:07 PM Asia/Singapore

How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

  • Gradually escalating requests
  • Invoking respected authorities
  • Constructing context-rich narratives
  • The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Share:

PreviousSurvey of Hong Kong workers explores challenges confronting workers’ financial well-being
NextAduna and SK telink Announce Collaboration to Bring Korea Into the Global Network API Ecosystem

Related Posts

How telcos are readying for the emerging Network API economy

How telcos are readying for the emerging Network API economy

October 18, 2024

Take customer experience to the next level with these four metrics

Take customer experience to the next level with these four metrics

April 24, 2020

When the Cloud gets too complex, pressure rains on IT/tech leaders

When the Cloud gets too complex, pressure rains on IT/tech leaders

March 31, 2023

Award-winning businesses share their data integration, API management and AI journeys

Award-winning businesses share their data integration, API management and AI journeys

June 5, 2025

Leave a reply Cancel reply

You must be logged in to post a comment.

Awards Nomination Banner

gamification list

PARTICIPATE NOW

top placement

Whitepapers

  • Achieve Modernization Without the Complexity

    Achieve Modernization Without the Complexity

    Transforming IT infrastructure is crucial …Download Whitepaper
  • 5 Steps to Boost IT Infrastructure Reliability

    5 Steps to Boost IT Infrastructure Reliability

    In today's fast-evolving tech landscape, …Download Whitepaper
  • Simplify Payroll Setup for Your Small Business

    Simplify Payroll Setup for Your Small Business

    In our free guide, "How …Download Whitepaper
  • Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Download Whitepaper

Middle Placement

Case Studies

  • Mergers and acquisitions drive urgent need for IT infrastructure overhaul: Access Group

    Mergers and acquisitions drive urgent need for IT infrastructure overhaul: Access Group

    Standardizing disparate enterprise-data infrastructures and …Read More
  • DIS recognized for driving open-source excellence in Singapore’s defense

    DIS recognized for driving open-source excellence in Singapore’s defense

    The Digital and Intelligence Service …Read More
  • Krom Bank renews cloud banking platform partnership to scale digital services in Indonesia

    Krom Bank renews cloud banking platform partnership to scale digital services in Indonesia

    The Indonesian digital bank will …Read More
  • Globe Business reduces overall customer service workload by 34% through digitalization

    Globe Business reduces overall customer service workload by 34% through digitalization

    This was the result of …Read More

Bottom Sidebar

Other News

  • Laronix Secures $3.2 Million Grant from the Australian Government to Bring its AI-Powered Voice Technology to the Market

    November 13, 2025
    BRISBANE, Australia, Nov. 13, 2025 …Read More »
  • Starwood Capital Group, Doma Infrastructure Group and Telstra InfraCo Announce Agreement to Develop 62MW AI-Optimised Data Centre in Western Sydney

    November 13, 2025
    Development Approval has been secured …Read More »
  • Straits Interactive, in collaboration with Golden Gate University, Eduvate Hub, and upGrad, Launches ‘The AI Factory – AI Capability Guide for SMEs’

    November 13, 2025
    SINGAPORE, Nov. 13, 2025 /PRNewswire/ …Read More »
  • New Sandvik report reveals a golden opportunity to attract engineers to the mining industry

    November 12, 2025
    STOCKHOLM, Nov. 12, 2025 /PRNewswire/ …Read More »
  • Nextvestment provides financial guidance co-pilot software for Phillip Securities’ trading platform’s AI capability

    November 12, 2025
    SINGAPORE, Nov. 12, 2025 /PRNewswire/ …Read More »
  • Our Brands
  • CybersecAsia
  • MartechAsia
  • Home
  • About Us
  • Contact Us
  • Sitemap
  • Privacy & Cookies
  • Terms of Use
  • Advertising & Reprint Policy
  • Media Kit
  • Subscribe
  • Manage Subscriptions
  • Newsletter

Copyright © 2025 DigiconAsia All Rights Reserved.