# Upload UI v2 Rollout Runbook ## Status - Upload UI v2 is production-ready. - Feature flag posture: `uploads_v2` default ON. - Emergency override remains available through `SKINBASE_UPLOADS_V2=false`. ## Scope - Route: `/upload` - UI: React/Inertia Upload Wizard v2 - API endpoints in use: - `POST /api/uploads/init` - `POST /api/uploads/chunk` - `POST /api/uploads/finish` - `GET /api/uploads/{id}/status` - `POST /api/uploads/{id}/publish` - `POST /api/uploads/cancel` ## Legacy Flow Policy - Current state: legacy upload flow remains in code behind feature flag branch. - Removal decision: **scheduled removal** (not immediate deletion). - Target window: remove legacy branch in the next hardening cycle after stable production operation. - Suggested checkpoint gates before removal: 1. 7 consecutive days with no Sev-1/Sev-2 upload regressions. 2. Upload completion rate at or above pre-v2 baseline. 3. No unresolved blockers in publish/cancel/status polling. ## Rollout Checklist ### 1) Staging - Set `SKINBASE_UPLOADS_V2=true` in staging env. - Build and deploy current commit. - Verify upload happy paths: - image upload (jpg/png/webp) - archive upload with required screenshots - cancel in-progress upload - publish after ready state - Verify failure paths: - invalid file type - over-size files - processing/publish API failure surfaces retry/reset correctly - Verify analytics events emitted in browser: - `upload_start` - `upload_complete` - `upload_publish` - `upload_cancel` - `upload_error` ### 2) Production Enablement - Confirm production env has `SKINBASE_UPLOADS_V2=true` (or unset, default ON). - Deploy release artifact. - Run smoke tests on `/upload` with one image and one archive flow. - Confirm endpoints respond with expected status codes under normal load. ### 3) Post-Deploy Verification (0-24h) - Validate build artifact and route rendering: - `/upload` renders v2 wizard UI - no front-end boot errors in browser console - Validate pipeline behavior: - init/chunk/finish/status/publish/cancel all reachable - status polling transitions to ready/publishable where expected - Validate user outcomes: - completion and publish rates are stable vs prior day baseline - no spike in cancellation due to UI confusion ## Post-Deploy Monitoring Plan ### Key Metrics - Upload start volume (`upload_start`) - Upload completion volume (`upload_complete`) - Publish success volume (`upload_publish`) - Error volume by stage (`upload_error.stage`) - Cancel volume (`upload_cancel`) - Derived funnel: - start -> complete conversion - complete -> publish conversion - overall start -> publish conversion ### Operational Signals - API error rates for `/api/uploads/*` - p95 latency for `init`, `chunk`, `finish`, `status`, `publish` - 4xx/5xx split by endpoint - Client-side uncaught exceptions on `/upload` ### Alert Thresholds (initial) - Critical rollback candidate: - `upload_error` rate > 2x baseline for 15+ minutes, or - publish failure rate > 5% sustained for 15+ minutes, or - any endpoint 5xx rate > 3% sustained for 10+ minutes. - Warning/observe: - completion funnel drops > 10% vs trailing 7-day average. ## Rollback Plan ### Fast Toggle Rollback (preferred) 1. Set `SKINBASE_UPLOADS_V2=false`. 2. Reload config/cache per deploy process. 3. Verify `/upload` serves legacy flow. 4. Continue API monitoring until error rates normalize. ### Release Rollback (if needed) 1. Roll back to prior release artifact. 2. Keep `SKINBASE_UPLOADS_V2=false` during stabilization. 3. Re-run smoke test for upload + publish. ### Communication - Post incident update in release channel with: - start time - impact scope (upload, publish, cancel) - rollback action taken - follow-up issue link ## Ownership and Next Actions - Owner: Upload frontend + API maintainers. - First review checkpoint: 24h post deploy. - Second checkpoint: 7 days post deploy for legacy removal go/no-go. - If metrics remain healthy, create removal PR for legacy branch in `/upload` page component.