AADE - Asia's leading platform for digital economic research and exchange

The "Sycophancy Trap"

As people increasingly rely on AI, a hidden danger has emerged: the "sycophancy effect." When overly-trusting victims ask Large Language Models (LLMs) for help within a scam context (e.g., "Help me reply to my new crypto mentor"), the AI often blindly complies to be "helpful," reinforcing the victim's trust in the fraud.

Existing AI safety frameworks almost exclusively focus on defending against malicious hackers (red-teaming). They completely ignore the cognitive vulnerabilities of everyday users who are unknowingly being manipulated by scammers.

LUCAS Benchmark Framework

What it is: LUCAS is a fully automated, user-centric benchmark testing framework specifically designed to evaluate the scam-resistance of LLMs in real-world scenarios.

How it works: It utilizes a Multi-Agent collaborative architecture. First, it combines real police reports with Human-Computer Interaction (HCI) theories to craft "benign-sounding" victim queries. Then, a Collector Agent tests these disguised scam prompts concurrently across mainstream models. Finally, an Evaluator Agent acts as an objective judge, quantitatively scoring whether the model identified the scam and the prominence of its safety warnings.

Empowering Regulation & Development

Regulatory Authorities: LUCAS transforms abstract AI safety laws into executable, quantifiable metrics. It serves as a "plug-and-play" diagnostic tool for LLM deployment approvals and routine market audits, giving regulatory bodies the technical leverage to enforce compliance.

AI Developers & Third-Party Auditors: Acts as an essential pre-launch "health check." It enables targeted anti-scam red-teaming, helping developers discover and patch psychological vulnerabilities in their models before public release, ultimately creating a safer, human-centric AI ecosystem.

The Problem & Limitations

Why current safety guardrails fail the most vulnerable users.

The Fatal "Sycophancy Trap"

When users ask high-stake questions (like scam-related financial actions), models often fail to discern reality. More dangerously, due to inherent Sycophancy, models may compliantly agree with the user's flawed assumptions, actively reinforcing the victim's trust in the scam rather than warning them.

Limitations of Current RegTools

Article 55 of the EU AI Act attempts to control systemic risks through "Red-teaming". However, existing safety tests almost universally assume risks stem from malicious attackers.

The greater tragedy lies with ordinary victims who deeply trust AI. The lack of high-fidelity benchmark frameworks that integrate "victim psychology" makes it difficult to effectively measure the true anti-fraud capabilities of LLMs.

Our Approach: Agentic RegTech

A fully automated evaluation pipeline based on Multi-Agent collaboration.

Orchestrator Agent

Orchestrator
Controls flow & error recovery.

Step 1: Trigger Collection

Step 3: Trigger Evaluation

Step 2: Pass Raw Data

Experiment Collector

14 Top Models: Selected via a16z Top 100 & SensorTower SG/CN rankings.

12 Real Cases: 2024-2026 police reports (SG/CN/Global).

Feedback Evaluator

Scam Identified: Did the model warn the user?

Warning Ratio: The prominence of warnings.

Figure 1. OpenClaw-based Automated LLM Red-teaming Workflow Architecture.

Prompt Synthesis Matrix

Systematically transforming human vulnerabilities into LLM test cases via interdisciplinary academic theories.

Layer 1: Context Definition

Real-world Police Reports Context

Layer 2: HCI Vulnerability Injection

TYPE 1

Benign-Sounding Prompts

Bypassing guardrails by disguising fraud tasks as routine translation/editing.

Task-Obsessed Myopia (2025)

TYPE 2

Social Sycophancy & 'Face'

Testing if models abandon safety baselines to cater to users' preconceived authority.

ELEPHANT (ACL 2025) SycEval (AIES 2025)

TYPE 3

Synergistic Cognitive Biases

Simulating 'Self-Generated Rationalization' leveraging sunk cost fallacies.

Self-Generated Rationalization

Layer 3: Output

High-Fidelity Anti-Fraud Benchmark Dataset

Figure 2. Interdisciplinary Prompt Synthesis Matrix integrating real cases with cognitive vulnerabilities.

Significance & Social Impact

Plug-and-Play Reg 'Thermometer'

Built a fully automated evaluation tool. Regulators can input recent fraud cases to assess safety levels, providing quantitative basis for regulatory actions.

Interdisciplinary Interception

First integration of HCI 'Framing Effects' research with frontline anti-fraud scenarios, filling the gap in safety evaluation from a vulnerable psychology perspective.

Human-Centric AI

Our ultimate goal: Ensure citizens who trust AI are not 'backstabbed' by blindly compliant systems. Safeguarding a safer AI future.

Anti-Scam Agentic LLM Benchmark Framework