How Three Context Layers Cooperate — and How They Compare to Mainstream Approaches
How the sliding window, ancestor summary chain, and RAG semantic recall divide labor, compared against full history, fixed window, and RAG-only mainstream approaches.
The previous three articles introduced the sliding window, the ancestor summary chain, and dual-track RAG. This article brings them together: how they divide labor, how they complement each other, and how they compare to mainstream approaches.
§ 01Part 1 — Division of labor
┌──────────────────── messages sent to LLM ──────────────────────┐ │ [system] Main thread summary ← inter-thread: structural │ │ [system] Pin A summary ← inter-thread: ancestor chain│ │ [system] Pin B summary ← (larger budget = closer) │ │ [system] Anchor text ← inter-thread: focus point │ │ [system] Relevant file chunks ← RAG: document retrieval │ │ [system] Relevant past exchanges ← RAG: cross-thread recall │ │ [user/assistant] Recent 10 msgs ← intra-thread: recent conv │ └─────────────────────────────────────────────────────────────────┘
- Sliding window: recent conversation, most direct reference, always present
- Ancestor summary chain: structural background — tells the LLM where this sub-thread came from
- RAG: semantic associations that cross thread boundaries, filling gaps the summary layer can't reach
§ 02Part 2 — Mainstream approaches compared
Approach A: Full context
Pass everything to the LLM, relying on large-window models (Gemini 1M, GPT-4o 128K).
- Pro: complete information
- Con: cost scales linearly with conversation length; nested multi-thread causes token explosion; most content is irrelevant noise
Approach B: Fixed window
Pass only the most recent N messages, discard the rest. Early ChatGPT's approach.
- Pro: simple to implement, token-predictable
- Con: loses critical background after N messages; no cross-thread support at all
Approach C: RAG-only
Embed all history, retrieve the K most relevant pieces each time. The core idea behind memory systems like Mem0.
- Pro: theoretically unlimited history, token-controlled
- Con: structural information lost (chronological order, causality, nesting); pronoun resolution fails ('this', 'it', 'as mentioned above')
Approach D: Summary chain
Incrementally compress history, maintain an always-current summary. Claude's conversation summarization takes this approach.
- Pro: structure preserved, token-controlled
- Con: summaries inevitably lose detail; cross-thread semantic associations still absent
Why all three layers are indispensable
Sliding window alone: the thread forgets once it's long enough, and cross-thread awareness is zero.
Summary chain alone: knows ancestor background, but sibling threads' discussions are permanently invisible, and file content can't be searched.
RAG alone: can retrieve related content, but has no idea about the current sub-thread's structural origin (which pin opened it) — anchor information is lost.
Deeppin's combination
Summary chain (inter-thread) + fixed window (intra-thread) + RAG (cross-thread semantics) = structural completeness + recent detail + semantic association. Each mechanism's weakness is covered by the others:
Summary chain → structural background (where I came from) ✓ detail loss ✓ Fixed window → recent detail (what just happened) ✓ long-term × RAG → semantic association (where was this discussed)✓ structure × Combined: Structural background ✓ Recent detail ✓ Semantic association ✓ Token-controlled ✓
§ 03Part 3 — Actual token distribution
In a depth-3 sub-thread, typical token allocation across context layers:
Main thread summary 150 tokens ~9% Pin A summary 300 tokens ~18% Pin B summary 500 tokens ~30% Anchor text 50 tokens ~3% RAG file chunks 400 tokens ~24% RAG conversation mem 200 tokens ~12% Current conversation 70 tokens ~4% (10 messages) ────────────────────────────────────────── Total 1670 tokens ← well below the 7200-token cap
The ancestor summary chain is the largest portion (~61%), which makes sense — a sub-thread exists to drill into a specific background, and that background is the most important information.