Files
SkinbaseNova/docs/ui/upload-v2-rollout-runbook.md
2026-02-14 15:14:12 +01:00

131 lines
4.0 KiB
Markdown

# Upload UI v2 Rollout Runbook
## Status
- Upload UI v2 is production-ready.
- Feature flag posture: `uploads_v2` default ON.
- Emergency override remains available through `SKINBASE_UPLOADS_V2=false`.
## Scope
- Route: `/upload`
- UI: React/Inertia Upload Wizard v2
- API endpoints in use:
- `POST /api/uploads/init`
- `POST /api/uploads/chunk`
- `POST /api/uploads/finish`
- `GET /api/uploads/{id}/status`
- `POST /api/uploads/{id}/publish`
- `POST /api/uploads/cancel`
## Legacy Flow Policy
- Current state: legacy upload flow remains in code behind feature flag branch.
- Removal decision: **scheduled removal** (not immediate deletion).
- Target window: remove legacy branch in the next hardening cycle after stable production operation.
- Suggested checkpoint gates before removal:
1. 7 consecutive days with no Sev-1/Sev-2 upload regressions.
2. Upload completion rate at or above pre-v2 baseline.
3. No unresolved blockers in publish/cancel/status polling.
## Rollout Checklist
### 1) Staging
- Set `SKINBASE_UPLOADS_V2=true` in staging env.
- Build and deploy current commit.
- Verify upload happy paths:
- image upload (jpg/png/webp)
- archive upload with required screenshots
- cancel in-progress upload
- publish after ready state
- Verify failure paths:
- invalid file type
- over-size files
- processing/publish API failure surfaces retry/reset correctly
- Verify analytics events emitted in browser:
- `upload_start`
- `upload_complete`
- `upload_publish`
- `upload_cancel`
- `upload_error`
### 2) Production Enablement
- Confirm production env has `SKINBASE_UPLOADS_V2=true` (or unset, default ON).
- Deploy release artifact.
- Run smoke tests on `/upload` with one image and one archive flow.
- Confirm endpoints respond with expected status codes under normal load.
### 3) Post-Deploy Verification (0-24h)
- Validate build artifact and route rendering:
- `/upload` renders v2 wizard UI
- no front-end boot errors in browser console
- Validate pipeline behavior:
- init/chunk/finish/status/publish/cancel all reachable
- status polling transitions to ready/publishable where expected
- Validate user outcomes:
- completion and publish rates are stable vs prior day baseline
- no spike in cancellation due to UI confusion
## Post-Deploy Monitoring Plan
### Key Metrics
- Upload start volume (`upload_start`)
- Upload completion volume (`upload_complete`)
- Publish success volume (`upload_publish`)
- Error volume by stage (`upload_error.stage`)
- Cancel volume (`upload_cancel`)
- Derived funnel:
- start -> complete conversion
- complete -> publish conversion
- overall start -> publish conversion
### Operational Signals
- API error rates for `/api/uploads/*`
- p95 latency for `init`, `chunk`, `finish`, `status`, `publish`
- 4xx/5xx split by endpoint
- Client-side uncaught exceptions on `/upload`
### Alert Thresholds (initial)
- Critical rollback candidate:
- `upload_error` rate > 2x baseline for 15+ minutes, or
- publish failure rate > 5% sustained for 15+ minutes, or
- any endpoint 5xx rate > 3% sustained for 10+ minutes.
- Warning/observe:
- completion funnel drops > 10% vs trailing 7-day average.
## Rollback Plan
### Fast Toggle Rollback (preferred)
1. Set `SKINBASE_UPLOADS_V2=false`.
2. Reload config/cache per deploy process.
3. Verify `/upload` serves legacy flow.
4. Continue API monitoring until error rates normalize.
### Release Rollback (if needed)
1. Roll back to prior release artifact.
2. Keep `SKINBASE_UPLOADS_V2=false` during stabilization.
3. Re-run smoke test for upload + publish.
### Communication
- Post incident update in release channel with:
- start time
- impact scope (upload, publish, cancel)
- rollback action taken
- follow-up issue link
## Ownership and Next Actions
- Owner: Upload frontend + API maintainers.
- First review checkpoint: 24h post deploy.
- Second checkpoint: 7 days post deploy for legacy removal go/no-go.
- If metrics remain healthy, create removal PR for legacy branch in `/upload` page component.