Deeppin/ articles
Article · performance

Response Optimizations That Matter for User Experience

From UUID pre-generation and SSE streaming to Nginx configuration and frontend rendering — every engineering detail that affects perceived speed.

2026-04-1510 min readperformanceSSEUX

Perceived speed ≠ actual latency. A system that starts showing content after 2 seconds feels slower than one that starts streaming character-by-character after 0.5 seconds, even if total completion time is the same. Deeppin's optimization focus is Time to First Token and eliminating the sense of waiting.

§ 01Part 1 — UUID pre-generation: zero-wait new conversations

Traditional flow: click 'New Chat' → request creates session → wait for DB write → navigate. That's 200–600ms of perceived waiting.

Deeppin's approach: generate a UUID client-side immediately after login. On click, navigate instantly using that UUID. The chat page creates the DB record lazily on initialization.

const prewarm = () => { prewarmedRef.current = crypto.randomUUID(); };

const handleNewChat = async () => {
  if (prewarmedRef.current) {
    const id = prewarmedRef.current;
    prewarmedRef.current = null;
    router.push(`/chat/${id}`);  // immediate navigation
    prewarm();                    // pre-generate for next time
    return;
  }
};

200–600ms wait becomes 0ms perceived latency.

§ 02Part 2 — Initial message passing

When a user types a message on the home page and clicks send, it needs to jump to the chat page and send that message. Cross-page parameter passing uses sessionStorage:

// Home page: save message then navigate
sessionStorage.setItem("deeppin:pending-msg", message.trim());
router.push(`/chat/${id}`);

// Chat page: read on initialization
const pending = sessionStorage.getItem("deeppin:pending-msg");
if (pending) {
  sessionStorage.removeItem("deeppin:pending-msg");
  await sendMessage(pending);
}

§ 03Part 3 — SSE streaming

LLM responses stream token-by-token via SSE rather than waiting for full generation. Users see content appear while the LLM is still generating.

# FastAPI async generator
async def stream_response():
    async for chunk in router.completion(**params, stream=True):
        token = chunk.choices[0].delta.content or ""
        if token:
            yield f"data: {json.dumps({'type':'token','text':token})}\n\n"
    yield "data: [DONE]\n\n"

return StreamingResponse(stream_response(), media_type="text/event-stream")

The frontend receives tokens via EventSource and appends each one to the current message:

const source = new EventSource(`/api/threads/${threadId}/chat`);
source.onmessage = (e) => {
  const data = JSON.parse(e.data);
  if (data.type === "token") {
    setCurrentMessage(prev => prev + data.text);
  }
};

§ 04Part 4 — Nginx: buffering must be disabled

This is the most common mistake. Nginx buffers proxy responses by default, accumulating chunks before forwarding. For SSE, this means tokens batch up and arrive all at once — destroying the streaming effect.

location / {
    proxy_pass http://localhost:8000;
    proxy_buffering off;     # critical
    proxy_cache off;         # critical
    proxy_read_timeout 300s; # LLM can be slow
    proxy_http_version 1.1;
    proxy_set_header Connection "";
}

§ 05Part 5 — Per-thread stream state in Zustand

Deeppin supports concurrent streaming across multiple threads (main thread and several pins simultaneously). Each thread has isolated stream state keyed by threadId in Zustand, so concurrent streams never interfere:

// useStreamStore.ts
interface StreamStore {
  streams: Record<string, {
    isStreaming: boolean;
    content: string;
    error: string | null;
  }>;
  appendToken: (threadId: string, token: string) => void;
  setStreaming: (threadId: string, value: boolean) => void;
}

// Isolated by threadId — no interference
appendToken: (threadId, token) =>
  set(state => ({
    streams: {
      ...state.streams,
      [threadId]: {
        ...state.streams[threadId],
        content: (state.streams[threadId]?.content ?? "") + token,
      },
    },
  })),

§ 06Part 6 — Streaming Markdown rendering

Markdown markers like **bold** appear malformed mid-stream (one ** without its closing pair). Solution: show raw text during streaming, offer a toggle to rendered Markdown after completion. Users can switch at any time.