Between-Thread Context: Ancestor Summary Chain and Anchors
When a pin opens a sub-thread, how key information from ancestor conversations is passed down — ancestor chain traversal, depth-based token budgets, and anchor text preservation.
Deeppin's thread structure is a tree. The main thread is the root, each pin opens a child node, and child nodes can themselves be pinned — infinitely. When a user sends a message in a sub-thread, the LLM needs to know where this sub-thread came from.
If Pin C receives "How does this fundamentally differ from CNN's local receptive fields?", the LLM must understand "this" refers to multi-head attention, which was being discussed in the context of attention mechanisms. That background lives in the ancestor threads.
§ 01Part 1 — Ancestor chain traversal
Each thread stores a parent_thread_id. Rather than recursively querying the database N times, we fetch all threads in the session once and traverse in memory:
# Fetch all threads in the session in one query
all_threads_res = await _db(
lambda: get_supabase().table("threads")
.select("id, parent_thread_id, depth, anchor_text")
.eq("session_id", session_id)
.execute()
)
all_threads = {t["id"]: t for t in all_threads_res.data}
# In-memory traversal upward
ancestor_chain = []
current = thread
while current.get("parent_thread_id"):
parent = all_threads.get(current["parent_thread_id"])
if not parent: break
ancestor_chain.append(parent)
current = parent
# Reverse to root-first: [main, Pin A, Pin B]
ancestors_root_first = list(reversed(ancestor_chain))§ 02Part 2 — Depth-based token budgets
Each ancestor is compressed into a summary. Token budgets are allocated by distance: closer ancestors are more relevant and get larger budgets; further ancestors are compressed more aggressively.
_BUDGETS_BY_DEPTH = [800, 500, 300, 150] # Allocate budgets for [main, Pin A, Pin B] # Direct parent (Pin B) gets the largest budget budgets = [_budget_for_depth(i) for i in reversed(range(len(ancestors_root_first)))] # → [150, 300, 800] (main=150, Pin A=300, Pin B=800)
All ancestor summaries are fetched concurrently:
summaries = await asyncio.gather(*[
_get_or_create_summary(anc["id"], budget)
for anc, budget in zip(ancestors_root_first, budgets)
])Injected root-first so the LLM reads background chronologically:
[system] [Main thread summary] ← ≤150 tokens, earliest context [system] [Depth-1 thread summary] ← ≤300 tokens [system] [Depth-2 thread summary] ← ≤800 tokens, most recent context
§ 03Part 3 — Anchor text preserved in full
Every pin has an anchor — the exact text the user highlighted. The anchor is the reason the sub-thread exists. It is never compressed:
anchor = thread.get("anchor_text", "")
if anchor:
prefix.append({
"role": "system",
"content": f'The user highlighted the following text and is asking a follow-up. '
f'Please focus your answer on this passage:\n"{anchor}"',
})Anchors are typically short — tens to hundreds of characters — so the cost of preserving them is low, but the benefit to the LLM's focus is high.
§ 04Part 4 — Sibling thread isolation
Sibling pins at the same depth (e.g. Pin A and Pin D, both children of the main thread) are completely isolated — one pin cannot see what happens in another.
This is intentional: each pin is a deep-dive into one specific question. Cross-contamination would blur the LLM's focus. Cross-thread semantic associations are handled by the RAG layer (next article).