131 lines
4.0 KiB
Markdown
131 lines
4.0 KiB
Markdown
# Upload UI v2 Rollout Runbook
|
|
|
|
## Status
|
|
|
|
- Upload UI v2 is production-ready.
|
|
- Feature flag posture: `uploads_v2` default ON.
|
|
- Emergency override remains available through `SKINBASE_UPLOADS_V2=false`.
|
|
|
|
## Scope
|
|
|
|
- Route: `/upload`
|
|
- UI: React/Inertia Upload Wizard v2
|
|
- API endpoints in use:
|
|
- `POST /api/uploads/init`
|
|
- `POST /api/uploads/chunk`
|
|
- `POST /api/uploads/finish`
|
|
- `GET /api/uploads/{id}/status`
|
|
- `POST /api/uploads/{id}/publish`
|
|
- `POST /api/uploads/cancel`
|
|
|
|
## Legacy Flow Policy
|
|
|
|
- Current state: legacy upload flow remains in code behind feature flag branch.
|
|
- Removal decision: **scheduled removal** (not immediate deletion).
|
|
- Target window: remove legacy branch in the next hardening cycle after stable production operation.
|
|
- Suggested checkpoint gates before removal:
|
|
1. 7 consecutive days with no Sev-1/Sev-2 upload regressions.
|
|
2. Upload completion rate at or above pre-v2 baseline.
|
|
3. No unresolved blockers in publish/cancel/status polling.
|
|
|
|
## Rollout Checklist
|
|
|
|
### 1) Staging
|
|
|
|
- Set `SKINBASE_UPLOADS_V2=true` in staging env.
|
|
- Build and deploy current commit.
|
|
- Verify upload happy paths:
|
|
- image upload (jpg/png/webp)
|
|
- archive upload with required screenshots
|
|
- cancel in-progress upload
|
|
- publish after ready state
|
|
- Verify failure paths:
|
|
- invalid file type
|
|
- over-size files
|
|
- processing/publish API failure surfaces retry/reset correctly
|
|
- Verify analytics events emitted in browser:
|
|
- `upload_start`
|
|
- `upload_complete`
|
|
- `upload_publish`
|
|
- `upload_cancel`
|
|
- `upload_error`
|
|
|
|
### 2) Production Enablement
|
|
|
|
- Confirm production env has `SKINBASE_UPLOADS_V2=true` (or unset, default ON).
|
|
- Deploy release artifact.
|
|
- Run smoke tests on `/upload` with one image and one archive flow.
|
|
- Confirm endpoints respond with expected status codes under normal load.
|
|
|
|
### 3) Post-Deploy Verification (0-24h)
|
|
|
|
- Validate build artifact and route rendering:
|
|
- `/upload` renders v2 wizard UI
|
|
- no front-end boot errors in browser console
|
|
- Validate pipeline behavior:
|
|
- init/chunk/finish/status/publish/cancel all reachable
|
|
- status polling transitions to ready/publishable where expected
|
|
- Validate user outcomes:
|
|
- completion and publish rates are stable vs prior day baseline
|
|
- no spike in cancellation due to UI confusion
|
|
|
|
## Post-Deploy Monitoring Plan
|
|
|
|
### Key Metrics
|
|
|
|
- Upload start volume (`upload_start`)
|
|
- Upload completion volume (`upload_complete`)
|
|
- Publish success volume (`upload_publish`)
|
|
- Error volume by stage (`upload_error.stage`)
|
|
- Cancel volume (`upload_cancel`)
|
|
- Derived funnel:
|
|
- start -> complete conversion
|
|
- complete -> publish conversion
|
|
- overall start -> publish conversion
|
|
|
|
### Operational Signals
|
|
|
|
- API error rates for `/api/uploads/*`
|
|
- p95 latency for `init`, `chunk`, `finish`, `status`, `publish`
|
|
- 4xx/5xx split by endpoint
|
|
- Client-side uncaught exceptions on `/upload`
|
|
|
|
### Alert Thresholds (initial)
|
|
|
|
- Critical rollback candidate:
|
|
- `upload_error` rate > 2x baseline for 15+ minutes, or
|
|
- publish failure rate > 5% sustained for 15+ minutes, or
|
|
- any endpoint 5xx rate > 3% sustained for 10+ minutes.
|
|
- Warning/observe:
|
|
- completion funnel drops > 10% vs trailing 7-day average.
|
|
|
|
## Rollback Plan
|
|
|
|
### Fast Toggle Rollback (preferred)
|
|
|
|
1. Set `SKINBASE_UPLOADS_V2=false`.
|
|
2. Reload config/cache per deploy process.
|
|
3. Verify `/upload` serves legacy flow.
|
|
4. Continue API monitoring until error rates normalize.
|
|
|
|
### Release Rollback (if needed)
|
|
|
|
1. Roll back to prior release artifact.
|
|
2. Keep `SKINBASE_UPLOADS_V2=false` during stabilization.
|
|
3. Re-run smoke test for upload + publish.
|
|
|
|
### Communication
|
|
|
|
- Post incident update in release channel with:
|
|
- start time
|
|
- impact scope (upload, publish, cancel)
|
|
- rollback action taken
|
|
- follow-up issue link
|
|
|
|
## Ownership and Next Actions
|
|
|
|
- Owner: Upload frontend + API maintainers.
|
|
- First review checkpoint: 24h post deploy.
|
|
- Second checkpoint: 7 days post deploy for legacy removal go/no-go.
|
|
- If metrics remain healthy, create removal PR for legacy branch in `/upload` page component.
|