Files
SkinbaseNova/docs/ui/upload-v2-rollout-runbook.md
2026-02-14 15:14:12 +01:00

4.0 KiB

Upload UI v2 Rollout Runbook

Status

  • Upload UI v2 is production-ready.
  • Feature flag posture: uploads_v2 default ON.
  • Emergency override remains available through SKINBASE_UPLOADS_V2=false.

Scope

  • Route: /upload
  • UI: React/Inertia Upload Wizard v2
  • API endpoints in use:
    • POST /api/uploads/init
    • POST /api/uploads/chunk
    • POST /api/uploads/finish
    • GET /api/uploads/{id}/status
    • POST /api/uploads/{id}/publish
    • POST /api/uploads/cancel

Legacy Flow Policy

  • Current state: legacy upload flow remains in code behind feature flag branch.
  • Removal decision: scheduled removal (not immediate deletion).
  • Target window: remove legacy branch in the next hardening cycle after stable production operation.
  • Suggested checkpoint gates before removal:
    1. 7 consecutive days with no Sev-1/Sev-2 upload regressions.
    2. Upload completion rate at or above pre-v2 baseline.
    3. No unresolved blockers in publish/cancel/status polling.

Rollout Checklist

1) Staging

  • Set SKINBASE_UPLOADS_V2=true in staging env.
  • Build and deploy current commit.
  • Verify upload happy paths:
    • image upload (jpg/png/webp)
    • archive upload with required screenshots
    • cancel in-progress upload
    • publish after ready state
  • Verify failure paths:
    • invalid file type
    • over-size files
    • processing/publish API failure surfaces retry/reset correctly
  • Verify analytics events emitted in browser:
    • upload_start
    • upload_complete
    • upload_publish
    • upload_cancel
    • upload_error

2) Production Enablement

  • Confirm production env has SKINBASE_UPLOADS_V2=true (or unset, default ON).
  • Deploy release artifact.
  • Run smoke tests on /upload with one image and one archive flow.
  • Confirm endpoints respond with expected status codes under normal load.

3) Post-Deploy Verification (0-24h)

  • Validate build artifact and route rendering:
    • /upload renders v2 wizard UI
    • no front-end boot errors in browser console
  • Validate pipeline behavior:
    • init/chunk/finish/status/publish/cancel all reachable
    • status polling transitions to ready/publishable where expected
  • Validate user outcomes:
    • completion and publish rates are stable vs prior day baseline
    • no spike in cancellation due to UI confusion

Post-Deploy Monitoring Plan

Key Metrics

  • Upload start volume (upload_start)
  • Upload completion volume (upload_complete)
  • Publish success volume (upload_publish)
  • Error volume by stage (upload_error.stage)
  • Cancel volume (upload_cancel)
  • Derived funnel:
    • start -> complete conversion
    • complete -> publish conversion
    • overall start -> publish conversion

Operational Signals

  • API error rates for /api/uploads/*
  • p95 latency for init, chunk, finish, status, publish
  • 4xx/5xx split by endpoint
  • Client-side uncaught exceptions on /upload

Alert Thresholds (initial)

  • Critical rollback candidate:
    • upload_error rate > 2x baseline for 15+ minutes, or
    • publish failure rate > 5% sustained for 15+ minutes, or
    • any endpoint 5xx rate > 3% sustained for 10+ minutes.
  • Warning/observe:
    • completion funnel drops > 10% vs trailing 7-day average.

Rollback Plan

Fast Toggle Rollback (preferred)

  1. Set SKINBASE_UPLOADS_V2=false.
  2. Reload config/cache per deploy process.
  3. Verify /upload serves legacy flow.
  4. Continue API monitoring until error rates normalize.

Release Rollback (if needed)

  1. Roll back to prior release artifact.
  2. Keep SKINBASE_UPLOADS_V2=false during stabilization.
  3. Re-run smoke test for upload + publish.

Communication

  • Post incident update in release channel with:
    • start time
    • impact scope (upload, publish, cancel)
    • rollback action taken
    • follow-up issue link

Ownership and Next Actions

  • Owner: Upload frontend + API maintainers.
  • First review checkpoint: 24h post deploy.
  • Second checkpoint: 7 days post deploy for legacy removal go/no-go.
  • If metrics remain healthy, create removal PR for legacy branch in /upload page component.