4.0 KiB
4.0 KiB
Upload UI v2 Rollout Runbook
Status
- Upload UI v2 is production-ready.
- Feature flag posture:
uploads_v2default ON. - Emergency override remains available through
SKINBASE_UPLOADS_V2=false.
Scope
- Route:
/upload - UI: React/Inertia Upload Wizard v2
- API endpoints in use:
POST /api/uploads/initPOST /api/uploads/chunkPOST /api/uploads/finishGET /api/uploads/{id}/statusPOST /api/uploads/{id}/publishPOST /api/uploads/cancel
Legacy Flow Policy
- Current state: legacy upload flow remains in code behind feature flag branch.
- Removal decision: scheduled removal (not immediate deletion).
- Target window: remove legacy branch in the next hardening cycle after stable production operation.
- Suggested checkpoint gates before removal:
- 7 consecutive days with no Sev-1/Sev-2 upload regressions.
- Upload completion rate at or above pre-v2 baseline.
- No unresolved blockers in publish/cancel/status polling.
Rollout Checklist
1) Staging
- Set
SKINBASE_UPLOADS_V2=truein staging env. - Build and deploy current commit.
- Verify upload happy paths:
- image upload (jpg/png/webp)
- archive upload with required screenshots
- cancel in-progress upload
- publish after ready state
- Verify failure paths:
- invalid file type
- over-size files
- processing/publish API failure surfaces retry/reset correctly
- Verify analytics events emitted in browser:
upload_startupload_completeupload_publishupload_cancelupload_error
2) Production Enablement
- Confirm production env has
SKINBASE_UPLOADS_V2=true(or unset, default ON). - Deploy release artifact.
- Run smoke tests on
/uploadwith one image and one archive flow. - Confirm endpoints respond with expected status codes under normal load.
3) Post-Deploy Verification (0-24h)
- Validate build artifact and route rendering:
/uploadrenders v2 wizard UI- no front-end boot errors in browser console
- Validate pipeline behavior:
- init/chunk/finish/status/publish/cancel all reachable
- status polling transitions to ready/publishable where expected
- Validate user outcomes:
- completion and publish rates are stable vs prior day baseline
- no spike in cancellation due to UI confusion
Post-Deploy Monitoring Plan
Key Metrics
- Upload start volume (
upload_start) - Upload completion volume (
upload_complete) - Publish success volume (
upload_publish) - Error volume by stage (
upload_error.stage) - Cancel volume (
upload_cancel) - Derived funnel:
- start -> complete conversion
- complete -> publish conversion
- overall start -> publish conversion
Operational Signals
- API error rates for
/api/uploads/* - p95 latency for
init,chunk,finish,status,publish - 4xx/5xx split by endpoint
- Client-side uncaught exceptions on
/upload
Alert Thresholds (initial)
- Critical rollback candidate:
upload_errorrate > 2x baseline for 15+ minutes, or- publish failure rate > 5% sustained for 15+ minutes, or
- any endpoint 5xx rate > 3% sustained for 10+ minutes.
- Warning/observe:
- completion funnel drops > 10% vs trailing 7-day average.
Rollback Plan
Fast Toggle Rollback (preferred)
- Set
SKINBASE_UPLOADS_V2=false. - Reload config/cache per deploy process.
- Verify
/uploadserves legacy flow. - Continue API monitoring until error rates normalize.
Release Rollback (if needed)
- Roll back to prior release artifact.
- Keep
SKINBASE_UPLOADS_V2=falseduring stabilization. - Re-run smoke test for upload + publish.
Communication
- Post incident update in release channel with:
- start time
- impact scope (upload, publish, cancel)
- rollback action taken
- follow-up issue link
Ownership and Next Actions
- Owner: Upload frontend + API maintainers.
- First review checkpoint: 24h post deploy.
- Second checkpoint: 7 days post deploy for legacy removal go/no-go.
- If metrics remain healthy, create removal PR for legacy branch in
/uploadpage component.