4.8 KiB
4.8 KiB
Feed Rollout Runbook (clip-cosine-v2, prod set 1)
Scope
- Candidate:
clip-cosine-v2with weightsw1=0.52, w2=0.23, w3=0.15, w4=0.10 - Baseline:
clip-cosine-v1 - Rollout gates:
10% -> 50% -> 100% - Temporary policy:
save_rateis informational only until save-event schema reliability is confirmed in production.
Pre-flight checks
- Confirm config values:
DISCOVERY_ROLLOUT_ENABLED=trueDISCOVERY_ROLLOUT_BASELINE_ALGO_VERSION=clip-cosine-v1DISCOVERY_ROLLOUT_CANDIDATE_ALGO_VERSION=clip-cosine-v2DISCOVERY_ROLLOUT_ACTIVE_GATE=g10DISCOVERY_FORCE_ALGO_VERSIONis empty
- Confirm candidate weights are active in
config/discovery.phpand env overrides. - Confirm ingestion health for discovery events:
event_idpopulated for all new eventsfavoriteanddownloadevents present inuser_discovery_events
- Run daily aggregation:
php artisan analytics:aggregate-feed --date=YYYY-MM-DD
Gate progression
Gate 1: 10%
- Set:
DISCOVERY_ROLLOUT_ACTIVE_GATE=g10 - Observe for at least 2-3 days with minimum sample volume.
- Required checks:
- CTR delta vs baseline
- Long-dwell-share delta vs baseline
- Diversity concentration delta vs baseline
- Save-rate trend (informational only)
Promote to 50% only if no rollback trigger fires and no persistent warning trend is present.
Gate 2: 50%
- Set:
DISCOVERY_ROLLOUT_ACTIVE_GATE=g50 - Observe for 3-5 days with stable daily traffic.
- Apply same checks and thresholds.
Promote to 100% only with at least 2 consecutive healthy days.
Gate 3: 100%
- Set:
DISCOVERY_ROLLOUT_ACTIVE_GATE=g100 - Keep baseline available for rapid rollback via force toggle.
Monitoring thresholds (candidate vs baseline)
- CTR:
- Warning: drop >= 3%
- Rollback: drop >= 5% (or >= 10% in a single severe window)
- Long dwell share (
(dwell_30_120 + dwell_120_plus) / clicks):- Warning: drop >= 4%
- Rollback: drop >= 8% (or >= 12% in a single severe window)
- Diversity concentration (e.g. top-author/top-category share, near-duplicate concentration):
- Warning: rise >= 10%
- Rollback: rise >= 15%
Rollback actions
Immediate rollback (fastest)
- Set
DISCOVERY_FORCE_ALGO_VERSION=clip-cosine-v1 - Reload config/cache as needed in your deployment flow.
- Verify feed responses show
meta.algo_version=clip-cosine-v1.
Standard rollback
- Set
DISCOVERY_ROLLOUT_ACTIVE_GATE=g10(or disable rollout) - Keep candidate enabled only for controlled validation traffic.
Save-event schema note and fix
Observed issue class in mixed environments: save-event writes can fail if discovery event schema differs from code expectations (e.g., meta/metadata drift, required event_id).
Implemented fix path:
- Ingestion now always writes
event_idand inserts schema-aware metadata (metaif present, otherwisemetadataif present). - Keep
DISCOVERY_EVAL_SAVE_RATE_INFORMATIONAL=trueuntil production confirms stable save-event ingestion.
Validation query examples:
- Save events by day:
SELECT event_date, COUNT(*) FROM user_discovery_events WHERE event_type IN ('favorite','download') GROUP BY event_date ORDER BY event_date DESC;
- Null/empty event id check:
SELECT COUNT(*) FROM user_discovery_events WHERE event_id IS NULL OR event_id = '';
Daily operator checklist
- Run feed aggregation for the previous day.
- Run evaluator and compare commands:
php artisan analytics:evaluate-feed-weights --from=YYYY-MM-DD --to=YYYY-MM-DD --jsonphp artisan analytics:compare-feed-ab clip-cosine-v1 clip-cosine-v2 --from=YYYY-MM-DD --to=YYYY-MM-DD --json
- Record deltas for CTR, long_dwell_share, diversity concentration.
- Record save_rate as informational only.
- Decide: hold, promote gate, or rollback.
First 24h verification checklist
- Confirm rollout activation and gate state:
DISCOVERY_ROLLOUT_ENABLED=trueDISCOVERY_ROLLOUT_ACTIVE_GATE=g10DISCOVERY_FORCE_ALGO_VERSIONempty
- Verify both algos are receiving traffic in analytics:
- candidate (
clip-cosine-v2) should be near 10% share (allow normal variance) - baseline (
clip-cosine-v1) remains dominant
- Run aggregation/evaluation at least twice in first day (midday + end-of-day):
php artisan analytics:aggregate-feed --date=YYYY-MM-DDphp artisan analytics:evaluate-feed-weights --from=YYYY-MM-DD --to=YYYY-MM-DD --jsonphp artisan analytics:compare-feed-ab clip-cosine-v1 clip-cosine-v2 --from=YYYY-MM-DD --to=YYYY-MM-DD --json
- Check guardrails:
- CTR drop < rollback threshold
- long_dwell_share drop < rollback threshold
- diversity concentration rise < rollback threshold
- Check save-event ingestion health:
- save events (
favorite,download) are arriving inuser_discovery_events event_idis always populated
- If any rollback trigger is breached, apply emergency rollback preset immediately.