Article · context

Between-Thread Context: Ancestor Summary Chain and Anchors

When a pin opens a sub-thread, how key information from ancestor conversations is passed down — ancestor chain traversal, depth-based token budgets, and anchor text preservation.

2026-04-159 min readcontextthreadsarchitecture

Deeppin's thread structure is a tree. The main thread is the root, each pin opens a child node, and child nodes can themselves be pinned — infinitely. When a user sends a message in a sub-thread, the LLM needs to know where this sub-thread came from.

Fig. 1·thread-tree

If Pin C receives "How does this fundamentally differ from CNN's local receptive fields?", the LLM must understand "this" refers to multi-head attention, which was being discussed in the context of attention mechanisms. That background lives in the ancestor threads.

§ 01Part 1 — Ancestor chain traversal

Each thread stores a parent_thread_id. Rather than recursively querying the database N times, we fetch all threads in the session once and traverse in memory:

# Fetch all threads in the session in one query
all_threads_res = await _db(
    lambda: get_supabase().table("threads")
    .select("id, parent_thread_id, depth, anchor_text")
    .eq("session_id", session_id)
    .execute()
)
all_threads = {t["id"]: t for t in all_threads_res.data}

# In-memory traversal upward
ancestor_chain = []
current = thread
while current.get("parent_thread_id"):
    parent = all_threads.get(current["parent_thread_id"])
    if not parent: break
    ancestor_chain.append(parent)
    current = parent

# Reverse to root-first: [main, Pin A, Pin B]
ancestors_root_first = list(reversed(ancestor_chain))

iFetching all threads at once (rather than recursively) avoids N serial DB round-trips. A session typically has tens to hundreds of threads — in-memory traversal is negligible.

§ 02Part 2 — Depth-based token budgets

Each ancestor is compressed into a summary. Token budgets are allocated by distance: closer ancestors are more relevant and get larger budgets; further ancestors are compressed more aggressively.

_BUDGETS_BY_DEPTH = [800, 500, 300, 150]

# Allocate budgets for [main, Pin A, Pin B]
# Direct parent (Pin B) gets the largest budget
budgets = [_budget_for_depth(i) for i in reversed(range(len(ancestors_root_first)))]
# → [150, 300, 800]  (main=150, Pin A=300, Pin B=800)

All ancestor summaries are fetched concurrently:

summaries = await asyncio.gather(*[
    _get_or_create_summary(anc["id"], budget)
    for anc, budget in zip(ancestors_root_first, budgets)
])

Injected root-first so the LLM reads background chronologically:

[system] [Main thread summary]       ← ≤150 tokens, earliest context
[system] [Depth-1 thread summary]    ← ≤300 tokens
[system] [Depth-2 thread summary]    ← ≤800 tokens, most recent context

§ 03Part 3 — Anchor text preserved in full

Every pin has an anchor — the exact text the user highlighted. The anchor is the reason the sub-thread exists. It is never compressed:

anchor = thread.get("anchor_text", "")
if anchor:
    prefix.append({
        "role": "system",
        "content": f'The user highlighted the following text and is asking a follow-up. '
                   f'Please focus your answer on this passage:\n"{anchor}"',
    })

Anchors are typically short — tens to hundreds of characters — so the cost of preserving them is low, but the benefit to the LLM's focus is high.

§ 04Part 4 — Sibling thread isolation

Sibling pins at the same depth (e.g. Pin A and Pin D, both children of the main thread) are completely isolated — one pin cannot see what happens in another.

This is intentional: each pin is a deep-dive into one specific question. Cross-contamination would blur the LLM's focus. Cross-thread semantic associations are handled by the RAG layer (next article).