# Feed Rollout Runbook (clip-cosine-v2, prod set 1) ## Scope - Candidate: `clip-cosine-v2` with weights `w1=0.52, w2=0.23, w3=0.15, w4=0.10` - Baseline: `clip-cosine-v1` - Rollout gates: `10% -> 50% -> 100%` - Temporary policy: `save_rate` is informational only until save-event schema reliability is confirmed in production. ## Pre-flight checks 1. Confirm config values: - `DISCOVERY_ROLLOUT_ENABLED=true` - `DISCOVERY_ROLLOUT_BASELINE_ALGO_VERSION=clip-cosine-v1` - `DISCOVERY_ROLLOUT_CANDIDATE_ALGO_VERSION=clip-cosine-v2` - `DISCOVERY_ROLLOUT_ACTIVE_GATE=g10` - `DISCOVERY_FORCE_ALGO_VERSION` is empty 2. Confirm candidate weights are active in `config/discovery.php` and env overrides. 3. Confirm ingestion health for discovery events: - `event_id` populated for all new events - `favorite` and `download` events present in `user_discovery_events` 4. Run daily aggregation: - `php artisan analytics:aggregate-feed --date=YYYY-MM-DD` ## Gate progression ### Gate 1: 10% - Set: `DISCOVERY_ROLLOUT_ACTIVE_GATE=g10` - Observe for at least 2-3 days with minimum sample volume. - Required checks: - CTR delta vs baseline - Long-dwell-share delta vs baseline - Diversity concentration delta vs baseline - Save-rate trend (informational only) Promote to 50% only if no rollback trigger fires and no persistent warning trend is present. ### Gate 2: 50% - Set: `DISCOVERY_ROLLOUT_ACTIVE_GATE=g50` - Observe for 3-5 days with stable daily traffic. - Apply same checks and thresholds. Promote to 100% only with at least 2 consecutive healthy days. ### Gate 3: 100% - Set: `DISCOVERY_ROLLOUT_ACTIVE_GATE=g100` - Keep baseline available for rapid rollback via force toggle. ## Monitoring thresholds (candidate vs baseline) - CTR: - Warning: drop >= 3% - Rollback: drop >= 5% (or >= 10% in a single severe window) - Long dwell share (`(dwell_30_120 + dwell_120_plus) / clicks`): - Warning: drop >= 4% - Rollback: drop >= 8% (or >= 12% in a single severe window) - Diversity concentration (e.g. top-author/top-category share, near-duplicate concentration): - Warning: rise >= 10% - Rollback: rise >= 15% ## Rollback actions ### Immediate rollback (fastest) - Set `DISCOVERY_FORCE_ALGO_VERSION=clip-cosine-v1` - Reload config/cache as needed in your deployment flow. - Verify feed responses show `meta.algo_version=clip-cosine-v1`. ### Standard rollback - Set `DISCOVERY_ROLLOUT_ACTIVE_GATE=g10` (or disable rollout) - Keep candidate enabled only for controlled validation traffic. ## Save-event schema note and fix Observed issue class in mixed environments: save-event writes can fail if discovery event schema differs from code expectations (e.g., `meta`/`metadata` drift, required `event_id`). Implemented fix path: - Ingestion now always writes `event_id` and inserts schema-aware metadata (`meta` if present, otherwise `metadata` if present). - Keep `DISCOVERY_EVAL_SAVE_RATE_INFORMATIONAL=true` until production confirms stable save-event ingestion. Validation query examples: - Save events by day: - `SELECT event_date, COUNT(*) FROM user_discovery_events WHERE event_type IN ('favorite','download') GROUP BY event_date ORDER BY event_date DESC;` - Null/empty event id check: - `SELECT COUNT(*) FROM user_discovery_events WHERE event_id IS NULL OR event_id = '';` ## Daily operator checklist 1. Run feed aggregation for the previous day. 2. Run evaluator and compare commands: - `php artisan analytics:evaluate-feed-weights --from=YYYY-MM-DD --to=YYYY-MM-DD --json` - `php artisan analytics:compare-feed-ab clip-cosine-v1 clip-cosine-v2 --from=YYYY-MM-DD --to=YYYY-MM-DD --json` 3. Record deltas for CTR, long_dwell_share, diversity concentration. 4. Record save_rate as informational only. 5. Decide: hold, promote gate, or rollback. ## First 24h verification checklist 1. Confirm rollout activation and gate state: - `DISCOVERY_ROLLOUT_ENABLED=true` - `DISCOVERY_ROLLOUT_ACTIVE_GATE=g10` - `DISCOVERY_FORCE_ALGO_VERSION` empty 2. Verify both algos are receiving traffic in analytics: - candidate (`clip-cosine-v2`) should be near 10% share (allow normal variance) - baseline (`clip-cosine-v1`) remains dominant 3. Run aggregation/evaluation at least twice in first day (midday + end-of-day): - `php artisan analytics:aggregate-feed --date=YYYY-MM-DD` - `php artisan analytics:evaluate-feed-weights --from=YYYY-MM-DD --to=YYYY-MM-DD --json` - `php artisan analytics:compare-feed-ab clip-cosine-v1 clip-cosine-v2 --from=YYYY-MM-DD --to=YYYY-MM-DD --json` 4. Check guardrails: - CTR drop < rollback threshold - long_dwell_share drop < rollback threshold - diversity concentration rise < rollback threshold 5. Check save-event ingestion health: - save events (`favorite`,`download`) are arriving in `user_discovery_events` - `event_id` is always populated 6. If any rollback trigger is breached, apply emergency rollback preset immediately.