AADE Global Lab
Pillar IV | Digital Innovation

RAG-based Legal Infrastructure

Janus: A RAG-Powered Legal and Compliance Knowledge Infrastructure

Cross-Jurisdiction Compliance Blind Spot & AI Hallucinations

Understanding foreign regulations is difficult—even for legal professionals. Legal systems vary widely, and real enforcement standards are often opaque.

AI tools worsen the problem: LLMs frequently produce hallucinated legal answers without grounding in actual law, making them unreliable for compliance decisions.

Enterprises lack a trusted, jurisdiction-aware source of truth for cross-border regulation.

Janus: RAG-Based Legal Intelligence

What it is: A legal infrastructure that encodes laws and local expert annotations into a structured, machine-readable system.

How it works: Queries first trigger retrieval of jurisdiction-specific laws and expert commentary, then reasoning is performed on this grounded context—not the open internet.

This ensures outputs that are traceable, jurisdiction-aware, and hallucination-resistant.

Toward Executable Compliance

Janus is evolving to translate law into machine-executable constraints, starting with on-chain financial activity.

It targets market integrity risks (e.g., wash trading, TVL inflation, token manipulation) by combining legal rules, expert input, and on-chain signals.

Outputs are verifiable PoMI reports, designed for regulators, law firms, and auditors.

Differentiation: Jurisdiction-aware overrides + on-chain native risk detection beyond traditional compliance.

Modular RAG Architecture

Janus moves beyond standard vector search, adopting a deep architecture optimized for the strict accuracy, traceability, and jurisdictional constraints of the legal domain.

1. Knowledge Ingestion

Normalizes heterogeneous legal texts and implicit expert knowledge into a unified, machine-interpretable schema, preserving dependencies to build a hybrid Legal Knowledge Graph (LKG).

  • Hierarchical Chunking (Dependencies)
  • Obligation Classifiers & NER
  • Expert Annotation Binding

2. Hybrid Retrieval Engine

Maximizes recall and precision while strictly adhering to legal constraints (jurisdiction, effective dates). Fuses semantic, keyword, and graph traversal modes.

  • Dense Retrieval (Vector Semantics)
  • Sparse Retrieval (BM25 Keywords)
  • Graph-Aware Jurisdictional Filtering

3. Generation & Reasoning

Produces highly reliable, grounded, and auditable outputs using retrieved legal evidence. Enforces a "citation-first" reasoning logic to eliminate LLM hallucinations entirely.

  • Citation-Aware Generation
  • Multi-Jurisdiction Conflict Resolution
  • Constrained Decoding (Zero-Stochasticity)

The Foundation: Graph + Vector + Schema

Janus's hybrid data model combines vector-based semantic retrieval, graph-based logical traversal, and schema-based absolute filtering (e.g., specific jurisdictions), making traditional static legal databases obsolete.

Enterprise Applications

Same real-world query. Three retrieval architectures. Drastically different outcomes. This is why knowledge infrastructure design is the true moat.

// Live Query — RWA Cross-Border Compliance
> USER_INPUT: "Can a mainland China enterprise publicly issue RWA tokens to retail investors in Hong Kong?"
1
No RAG
Public Domain Blind Search
Technical Logic

LLM has no domain knowledge base. It scrapes fragmented public sources — KOL blogs, news, social media — and summarises whatever surfaces.

Interaction Flow
User
asks
Generic LLM
scrapes public web
Public Web
KOL posts · news · social media
returns fragments
Hallucinated Answer
Execution Log
// retrieval trace
> source: public_web_scrape
> retrieved: [
  "kol_hk_web3_2023.html",
  "news_sfc_opens_retail.txt",
  "telegram_rwa_hype.txt"
]
> jurisdiction_filter: NONE
> cn_mainland_rules: NOT CHECKED
> forex_control_flag: NOT CHECKED
// model output
"Yes — HK embraces Web3. SFC has officially opened retail crypto trading. RWA issuance is broadly feasible..."
// ⚠ FATAL: PRC prohibition on
// virtual asset sales to mainland
// residents — IGNORED
Outcome
Acting on this output exposes the enterprise to criminal & administrative penalties under PRC regulations. Zero citations. Zero reliability.
2
Pure Legal RAG
Dogmatic Retrieval
Technical Logic

A pre-built vector KB of raw legal clauses. At query time, the agent fetches the matching clause and assembles a JSON prompt before calling the analysis model.

Interaction Flow
① Setup Phase
Raw Legal Corpus
parse + embed
Vector Knowledge Base
② Query Phase
User
query
RAG Agent
fetch clause ↔ KB
JSON prompt assembly
Literal-Only Answer
Execution Log
// kb fetch + prompt assembly
> kb.clauses.fetch({
  jurisdiction: "HK",
  topic: "RWA_issuance"
})
> retrieved: ["SFC-26EC22"]
> prompt_payload: {
  "query": user_input,
  "context": [SFC_26EC22.clause_text]
}
// critical gaps
> expert_notes: null
> regulator_attitude: null
> HKMA-20240220 custody: MISSED
> CN_56hao_prereq: MISSED
> PI_only_in_practice: null
Outcome
Cites SFC-26EC22 accurately. But misses the HKMA custody bottleneck, the 18-month SFC approval cycle, and the PRC forex wall. Technically correct — operationally useless.
3
Janus RAG ✦
Expert Experience Injection
Technical Logic

Knowledge unit is a compound vector: Clause + Expert_Notes + Regulator_Attitude. Senior practitioners review & annotate every statute before indexing. Runtime retrieval pulls all three via graph traversal.

Interaction Flow
① Setup Phase (Expert-in-the-loop)
Statutes
Expert Review
Regulator Signals
fused into compound unit
Compound Knowledge Unit
clause + notes + attitude
indexed into
Janus Knowledge Graph
② Query Phase
User
query
Janus Agent
dense + sparse + graph retrieval
returns compound units
Citation-First LLM
cited answer
Actionable & Auditable
Execution Log
// janus compound retrieval
> janus.graph.query({
  filters: {
    jurisdiction: ["HK","CN"],
    effective_after: "2024-01"
  },
  modes: ["dense","sparse","graph"]
})
// compound_units retrieved
> unit_1: {
  clause: "SFC-26EC22 §3.2",
  expert_note: "Retail open in law;
    PI-only in practice [Partner,
    Tier-1 Law Firm, 2025]",
  regulator: "sandbox=big-banks-only",
  cross_border: "56-hao → BLOCKS"
}
> unit_2: {
  clause: "HKMA-20240220-11",
  expert_note: "custody choke-point"
}
Outcome
Flags the PRC 56-hao forex prerequisite, the HKMA custody bottleneck, and the sandbox reality. Every claim is cited. Output is a directly actionable compliance strategy, not a recitation of statute.
// Janus knowledge unit = clause + expert_notes + regulator_attitude
Hallucination risk | Practitioner blind spots | Actionable & auditable

Reliability & Benchmarks

Janus is designed to eliminate hallucinations. Evaluated against curated multi-jurisdictional legal QA datasets, its performance far exceeds generic LLMs and standard RAG models across all critical dimensions.

Multi-Dimensional Capability Comparison

Illustrative data reflecting the architectural objectives and benchmark targets of the Janus system.

Task Success: Dramatically improves the precision of resolving compliance queries.
Expert Validation: Absolute factual grounding validated by practicing attorneys.