LUCA: Evaluating the Scam-Resistance of LLMs through a User-Centric Agentic Framework
As people increasingly rely on AI, a hidden danger has emerged: the "sycophancy effect." When overly-trusting victims ask Large Language Models (LLMs) for help within a scam context (e.g., "Help me reply to my new crypto mentor"), the AI often blindly complies to be "helpful," reinforcing the victim's trust in the fraud.
Existing AI safety frameworks almost exclusively focus on defending against malicious hackers (red-teaming). They completely ignore the cognitive vulnerabilities of everyday users who are unknowingly being manipulated by scammers.
What it is: LUCAS is a fully automated, user-centric benchmark testing framework specifically designed to evaluate the scam-resistance of LLMs in real-world scenarios.
How it works: It utilizes a Multi-Agent collaborative architecture. First, it combines real police reports with Human-Computer Interaction (HCI) theories to craft "benign-sounding" victim queries. Then, a Collector Agent tests these disguised scam prompts concurrently across mainstream models. Finally, an Evaluator Agent acts as an objective judge, quantitatively scoring whether the model identified the scam and the prominence of its safety warnings.
Regulatory Authorities: LUCAS transforms abstract AI safety laws into executable, quantifiable metrics. It serves as a "plug-and-play" diagnostic tool for LLM deployment approvals and routine market audits, giving regulatory bodies the technical leverage to enforce compliance.
AI Developers & Third-Party Auditors: Acts as an essential pre-launch "health check." It enables targeted anti-scam red-teaming, helping developers discover and patch psychological vulnerabilities in their models before public release, ultimately creating a safer, human-centric AI ecosystem.
Why current safety guardrails fail the most vulnerable users.
When users ask high-stake questions (like scam-related financial actions), models often fail to discern reality. More dangerously, due to inherent Sycophancy, models may compliantly agree with the user's flawed assumptions, actively reinforcing the victim's trust in the scam rather than warning them.
Article 55 of the EU AI Act attempts to control systemic risks through "Red-teaming". However, existing safety tests almost universally assume risks stem from malicious attackers.
The greater tragedy lies with ordinary victims who deeply trust AI. The lack of high-fidelity benchmark frameworks that integrate "victim psychology" makes it difficult to effectively measure the true anti-fraud capabilities of LLMs.
A fully automated evaluation pipeline based on Multi-Agent collaboration.
Orchestrator
Controls flow & error recovery.
Experiment Collector
14 Top Models: Selected via a16z Top 100 & SensorTower SG/CN rankings.
12 Real Cases: 2024-2026 police reports (SG/CN/Global).
Feedback Evaluator
Scam Identified: Did the model warn the user?
Warning Ratio: The prominence of warnings.
Systematically transforming human vulnerabilities into LLM test cases via interdisciplinary academic theories.
Bypassing guardrails by disguising fraud tasks as routine translation/editing.
Task-Obsessed Myopia (2025)Testing if models abandon safety baselines to cater to users' preconceived authority.
Simulating 'Self-Generated Rationalization' leveraging sunk cost fallacies.
Self-Generated RationalizationBuilt a fully automated evaluation tool. Regulators can input recent fraud cases to assess safety levels, providing quantitative basis for regulatory actions.
First integration of HCI 'Framing Effects' research with frontline anti-fraud scenarios, filling the gap in safety evaluation from a vulnerable psychology perspective.
Our ultimate goal: Ensure citizens who trust AI are not 'backstabbed' by blindly compliant systems. Safeguarding a safer AI future.