Article · CI/CD

Zero-Downtime Deployment: Deeppin's CI/CD and Three-Layer Testing

The complete pipeline from git push to production: unit tests → deploy → integration tests, three gates catching issues at every level. Plus how 200+ unit tests cover all backend logic with zero real dependencies.

2026-04-1625 min readCI/CDtestingdeploymentDocker

The easiest thing to ignore on a solo full-stack project is deployment — push the code, pray it works. Deeppin had a complete CI/CD pipeline by Day 5, and every git push since then goes through three layers of verification before reaching production.

§ 01Part 1 — Architecture overview

Frontend and backend have completely separate deployment paths:

Frontend (Next.js): Vercel auto-detects pushes to main and deploys with zero configuration
Backend (FastAPI): GitHub Actions triggers when main is pushed with changes in backend/**

The backend CI/CD pipeline is the focus of this article. It consists of three Jobs chained sequentially — each must pass before the next begins:

git push main (backend/** changed)
  │
  ├── Job 1: unit-test (runs on GitHub Runner, no real dependencies)
  │     └── pytest tests/ --ignore=tests/integration
  │
  ├── Job 2: deploy (SSH to Oracle Cloud, docker compose up)
  │     ├── Start backend + searxng
  │     ├── Wait for healthcheck to pass (up to 120s)
  │     ├── Start nginx
  │     └── Run smoke test (end-to-end connectivity)
  │
  └── Job 3: integration-test (hit real API from GitHub Runner)
        └── pytest tests/integration/

iThe three Jobs are sequential (linked by needs): unit tests must pass before deploying, deployment must succeed before integration tests run. Any failure at any layer immediately halts the pipeline.

§ 02Part 2 — Layer 1: Unit tests (200+, zero real dependencies)

Unit tests run on GitHub Actions Ubuntu runners with no Supabase, no Groq, no SearXNG — all external dependencies are mocked. The goal is to verify pure logic correctness.

Test coverage

Currently 11 test files covering every core backend module:

test_stream_manager.py — all META parsing fallback paths (complete JSON, truncation repair, regex extraction, empty input), SSE formatting, cross-chunk sentinel stripping, full conversation round flow
test_llm_client.py — chat_stream, summarizer, merge_threads, classify_search_intent, vision routing and output format
test_context_builder.py — sliding window, summary injection, RAG context assembly, ancestor chain
test_attachment_processor.py — semantic chunking, vector embedding, file type handling
test_memory_service.py — conversation memory storage/retrieval, RAG search, long text chunking
test_merge_router.py — merge output format options, token budget allocation, content truncation
test_search_service.py — SearXNG calls, result injection, timeout degradation
test_embedding_service.py — model loading, vector dimensions, batch encoding
test_auth_dependency.py — JWT validation, no token / invalid token / expired token
test_session_messages.py — session CRUD, bulk message reads
test_sessions_auth.py — auth middleware interception on every endpoint

Mock strategy

All tests follow one principle: mock external dependencies, test only the module's own logic.

# Typical mock pattern — stream_manager test example
with patch("services.stream_manager.get_supabase", return_value=sb_mock), \
     patch("services.stream_manager.build_context", new=AsyncMock(return_value=[])), \
     patch("services.stream_manager.classify_search_intent", new=AsyncMock(return_value=False)), \
     patch("services.stream_manager.chat_stream", side_effect=fake_chat_stream):
    events = []
    async for event in stream_and_save("thread-1", "user question"):
        events.append(event)

Key detail: patch paths must reference the name in the module under test (services.stream_manager.get_supabase), not the module where it's defined (db.supabase.get_supabase). Python binds imported names to the importing module, so patching the wrong location silently fails.

CI environment setup

GitHub Actions uses placeholder environment variables so modules can import normally without connecting to real services:

env:
  SUPABASE_URL: https://placeholder.supabase.co
  SUPABASE_SERVICE_ROLE_KEY: placeholder
  SUPABASE_ANON_KEY: placeholder
  GROQ_API_KEYS: '["placeholder"]'
  LOG_DIR: /tmp/deeppin-logs

conftest.py also sets defaults, so running pytest locally requires no .env file at all.

§ 03Part 3 — Layer 2: Deployment + Smoke test

After unit tests pass, GitHub Actions SSHs into the Oracle Cloud server. This isn't a simple pull-and-restart — it's an ordered rolling deployment:

Deployment sequence

# 1. Pull latest code
git pull origin main

# 2. Start backend + searxng first (nginx waits for them)
docker compose up -d --build backend searxng

# 3. Wait for backend healthcheck to pass (up to 120s)
for i in $(seq 1 24); do
  STATUS=$(docker inspect --format='{{.State.Health.Status}}' deeppin-backend-1)
  [ "$STATUS" = "healthy" ] && break
  sleep 5
done

# 4. Start nginx only after backend is healthy
docker compose up -d nginx

# 5. Clean up old images
docker image prune -f

# 6. Run smoke test
bash scripts/smoke_test.sh https://deeppin.duckdns.org

Aggregated health check (/health)

The entire deployment flow hinges on the /health endpoint. Docker healthcheck calls it every 15 seconds, and it concurrently probes all external dependencies:

# health.py — concurrent component checks
# LLM probe removed 2026-04-16: provider-side rate limits caused spurious unhealthy
# flags that tripped nginx's depends_on chain. Key validity now lives in the zero-quota
# /health/providers/keys endpoint instead.
searxng_ok, supabase_ok, embedding_info = await asyncio.gather(
    _check_searxng(),      # SearXNG search engine reachable
    _check_supabase(),     # Supabase database connection healthy
    _check_embedding(),    # bge-m3 model loaded, dim=1024, similarity > 0.5
)

all_ok = searxng_ok and supabase_ok and embedding_info["ok"]
# 200 = healthy, 503 = degraded → Docker marks unhealthy

Nginx's depends_on is set to condition: service_healthy, meaning it only starts accepting traffic when the backend and all its dependencies are healthy. Users never see a half-initialized service.

Smoke test

Immediately after deployment, a bash script verifies end-to-end connectivity from the external HTTPS endpoint:

HTTPS reachable + returns valid JSON
backend / searxng / supabase / embedding all report true (LLM key validity is covered separately by the zero-quota /health/providers/keys)
Embedding model dimension is 1024, model name contains bge-m3
Unauthenticated request correctly returns 401

Any failure causes the script to exit non-zero, GitHub Actions turns red, and CI email notifications fire.

§ 04Part 4 — Layer 3: Integration tests (real API, real auth)

Smoke tests verify connectivity; integration tests verify business logic. They send real HTTP requests from the GitHub Runner to the live production API.

Dynamic test users

Integration tests don't depend on any pre-existing accounts. They dynamically create temporary users via the Supabase Admin API, then auto-delete them after tests complete:

# test fixture (session scope — one user shared across all tests)
test_email = f"ci-{uuid4().hex[:8]}@deeppin-ci.test"
create_r = httpx.post(
    f"{supabase_url}/auth/v1/admin/users",
    headers=admin_headers,
    json={"email": test_email, "password": random_password, "email_confirm": True},
)
# yield fixture auto-cleans up after tests
yield auth_headers
httpx.delete(f"{supabase_url}/auth/v1/admin/users/{user_id}", ...)

Test coverage

TestHealth — /health endpoint reachable, each component status correct
TestAuth — no token returns 401, invalid token returns 401, missing Bearer prefix returns 401
TestSession — create → appears in list → fetch detail → delete → nonexistent returns 404 (full lifecycle)
TestProviders — individually verify every provider + key combination is functional, catching dead keys

iIntegration tests run after deployment (needs: deploy), hitting the real live API. If they fail, it means the deployment succeeded but business functionality is broken — something neither unit tests nor smoke tests can catch.

§ 05Part 5 — Docker orchestration

Production runs on Oracle Cloud's permanently-free ARM instance (4 cores, 24GB), managed by Docker Compose with three services:

services:
  backend:       # FastAPI + uvicorn + LiteLLM + bge-m3
    healthcheck:
      test: ["CMD-SHELL", "curl -sf http://localhost:8000/health | grep -q '\"status\":\"ok\"'"]
      interval: 15s
      retries: 5
      start_period: 45s   # embedding model needs time to load

  searxng:       # Search engine (no separate healthcheck; backend /health covers it)

  nginx:         # Reverse proxy + HTTPS (Let's Encrypt)
    depends_on:
      backend:
        condition: service_healthy  # healthy = all dependencies ready

Startup chain: backend + searxng start in parallel → backend passes healthcheck (which includes searxng connectivity) → nginx starts → traffic flows. This chain guarantees users never hit a half-initialized service.

§ 06Part 6 — Local development loop

CI/CD protects production; local development has its own fast feedback loop:

# Standard flow after writing code
cd backend && pytest tests/ -q        # Run unit tests locally (~25s)
git add . && git commit -m "feat: ..." # Commit after passing
git push                               # Triggers CI/CD

# Debugging CI failures
gh run view --log                      # View Actions logs
docker compose logs backend --tail 50  # SSH to server, check logs

Local tests complete in 25 seconds, full CI takes 3-5 minutes (including deployment and integration tests). The vast majority of issues are caught at the local unit test stage.

§ 07Part 7 — What this system prevents

Regressions: 200+ unit tests cover all edge cases in context building, META parsing, streaming truncation, and more
Deployment incidents: healthcheck + smoke test ensure the service is fully ready before receiving traffic
Configuration drift: integration tests verify real API keys, real database, real auth chain
Environment differences: Docker guarantees dev and production run the same code
Slow debugging: three-layer testing narrows the search — which layer failed tells you where to look

On a solo project, you can't rely on manual checking. You rely on automated confidence. Every git push has three gates standing behind you.