RECENT STORIES:

Addressing digital sovereignty in a data-driven world
Pope Leo XIV says AI cannot be ‘morally neutral’, urges accountability...
BMO to Acquire Australia-Based Capital Markets Business of Euroz Hartl...
GLOBAL MITOCHONDRIAL HEALTH SUMMIT 2026 POSITIONS MITOCHONDRIAL HEALTH...
BRAHMA AI, Hakuhodo Technologies Partner to Scale High-Fidelity Digita...
SwitchBot Expands Matter-Enabled Lighting Portfolio with RGBICWW Ceili...
LOGIN REGISTER
DigiconAsia
  • Features
    • Featured

      Sovereign AI – a competitive advantage

      Sovereign AI – a competitive advantage

      Wednesday, June 24, 2026, 10:01 AM Asia/Singapore | Features
    • Featured

      Deployment outpacing validation in digital experience

      Deployment outpacing validation in digital experience

      Friday, June 12, 2026, 9:26 AM Asia/Singapore | Features
    • Featured

      Bridging the gap from AI prototype to production

      Bridging the gap from AI prototype to production

      Wednesday, June 10, 2026, 1:53 PM Asia/Singapore | Features
  • News
    • Featured

      Pope Leo XIV says AI cannot be ‘morally neutral’, urges accountability, responsible oversight

      Pope Leo XIV says AI cannot be ‘morally neutral’, urges accountability, responsible oversight

      Tuesday, June 30, 2026, 11:59 AM Asia/Singapore | News
    • Featured

      Quantum security milestone: ISO updates encryption standard to include quantum-resistant algorithms

      Quantum security milestone: ISO updates encryption standard to include quantum-resistant algorithms

      Monday, June 29, 2026, 10:37 AM Asia/Singapore | News
    • Featured

      UN approves first global rules for fully autonomous driving systems

      UN approves first global rules for fully autonomous driving systems

      Friday, June 26, 2026, 11:39 AM Asia/Singapore | News
  • Perspectives
  • Tips & Strategies
  • Whitepapers
  • Directory
  • E-Learning

Select Page

News

How psychological tactics can expose refusal limits of LLM: preprint

By DigiconAsia Editors | Monday, September 8, 2025, 5:07 PM Asia/Singapore

How psychological tactics can expose refusal limits of LLM: preprint

New research reveals new strategies that can induce large language model away from refusing to process forbidden topics.

Recent research from Northeastern University has suggested that psychological manipulation techniques can prompt large language models (LLMs) to answer questions they ordinarily refuse to address.

The preprint, authored by Can Rager, Chris Wendler, Rohit Gandikota, and David Bau, details a systematic testing of numerous prompts against various AI models, showing how certain persuasive and iterative strategies can dramatically increase compliance rates on forbidden topics.

The study introduces “refusal discovery”, a new task aimed at identifying and cataloging the range of subjects that models have been trained to reject. Using a method called token prefilling, the researchers uncovered an expansive list of sensitive topics, including political controversies, personal insults, and chemical processes that are generally blocked for safety reasons.

Strategies used include:

  • Gradually escalating requests
  • Invoking respected authorities
  • Constructing context-rich narratives
  • The Iterated Prefill Crawler (IPC) approach

Skillful use of these strategies had led to a significant rise in the frequency of prohibited responses. In benchmark tests, the “crawler” approach enabled the retrieval of nearly all censored topics, while testing on models from mainland China exposed consistent suppression of political criticism and other sensitive content.

Variation in how models refuse prompts emerged as a key finding. The team had observed differences that stem from distinct fine-tuning protocols, data sources, and technical adjustments such as quantization. Some released models that claimed to be uncensored were shown to reintroduce refusal behaviors following quantization, raising new questions about the reliability of so-called “decensored” public releases.The researchers argue that static benchmarks are insufficient, recommending persistent, dynamic auditing to track shifting refusal boundaries as both models and adversarial strategies evolve. Their findings suggest that a deep understanding and enumeration of what models will and will not discuss, plays a vital role in the safe deployment and governance of powerful modern LLMs.

According to the authors, transparency, accountability, and ongoing scrutiny are essential as these systems continue to shape information access and public discourse.

Share:

PreviousSurvey of Hong Kong workers explores challenges confronting workers’ financial well-being
NextAduna and SK telink Announce Collaboration to Bring Korea Into the Global Network API Ecosystem

Related Posts

Digitalized organizations and their workers have a bridge to cross

Digitalized organizations and their workers have a bridge to cross

April 14, 2021

Global Re expands into Dubai with automatic and analytics aforethought

Global Re expands into Dubai with automatic and analytics aforethought

December 11, 2023

How APAC’s hospitality industry can ride on the travel rebound resiliently

How APAC’s hospitality industry can ride on the travel rebound resiliently

June 6, 2023

Maintaining data health as a part of corporate DNA

Maintaining data health as a part of corporate DNA

August 3, 2021

Leave a reply Cancel reply

You must be logged in to post a comment.

Awards Nomination Banner

gamification list

PARTICIPATE NOW

top placement

Whitepapers

  • Achieve Modernization Without the Complexity

    Achieve Modernization Without the Complexity

    Transforming IT infrastructure is crucial …Download Whitepaper
  • 5 Steps to Boost IT Infrastructure Reliability

    5 Steps to Boost IT Infrastructure Reliability

    In today's fast-evolving tech landscape, …Download Whitepaper
  • Simplify Payroll Setup for Your Small Business

    Simplify Payroll Setup for Your Small Business

    In our free guide, "How …Download Whitepaper
  • Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Overcoming the Challenges of Cost & Complexity in the Cloud-first Era.

    Download Whitepaper

Middle Placement

Case Studies

  • The 48-hour lifeline: How the IRC rewrote the rules for crisis care

    The 48-hour lifeline: How the IRC rewrote the rules for crisis care

    In a world where crises …Read More
  • CALB upgrades data platform to support analytics, security, and battery lifecycle tracking

    CALB upgrades data platform to support analytics, security, and battery lifecycle tracking

    Deploying a petabyte-scale data lake …Read More
  • How a Vietnamese D2C retailer built its own secure digital infrastructure

    How a Vietnamese D2C retailer built its own secure digital infrastructure

    Would your organization build your …Read More
  • Liverpool FC to deliver more personalized, real-time digital fan experiences with AI

    Liverpool FC to deliver more personalized, real-time digital fan experiences with AI

    The football club will deepen …Read More

Bottom Sidebar

Other News

  • BMO to Acquire Australia-Based Capital Markets Business of Euroz Hartleys Group, Strengthening Global Metals & Mining Leadership

    June 30, 2026
    Combines BMO’s market-leading global metals …Read More »
  • GLOBAL MITOCHONDRIAL HEALTH SUMMIT 2026 POSITIONS MITOCHONDRIAL HEALTH AS THE FOUNDATION OF HEALTHSPAN

    June 30, 2026
    Sydney and Kuala Lumpur events …Read More »
  • BRAHMA AI, Hakuhodo Technologies Partner to Scale High-Fidelity Digital Humans Across Japan and APAC

    June 29, 2026
    LONDON and TOKYO, June 29, …Read More »
  • SwitchBot Expands Matter-Enabled Lighting Portfolio with RGBICWW Ceiling Light

    June 29, 2026
    TOKYO, June 29, 2026 /PRNewswire/ …Read More »
  • OceanBase Unveils AI Database: A New Product Portfolio Unifying Multimodal Data, Real-Time Analytics, and AI Agent Workloads

    June 29, 2026
    SINGAPORE, June 29, 2026 /PRNewswire/ …Read More »
  • Our Brands
  • CybersecAsia
  • MartechAsia
  • Home
  • About Us
  • Contact Us
  • Sitemap
  • Privacy & Cookies
  • Terms of Use
  • Advertising & Reprint Policy
  • Media Kit
  • Subscribe
  • Manage Subscriptions
  • Newsletter

Copyright © 2026 DigiconAsia All Rights Reserved.