Files
SkinbaseNova/docs/discovery-personalization-engine.md
2026-02-27 09:46:51 +01:00

592 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Discovery & Personalization Engine
Covers the trending system, following feed, personalized homepage, similar artworks, unified activity feed, and all input signal collection that powers the ranking formula.
---
## Table of Contents
1. [Architecture Overview](#1-architecture-overview)
2. [Input Signal Collection](#2-input-signal-collection)
3. [Windowed Stats (views & downloads)](#3-windowed-stats-views--downloads)
4. [Trending Engine](#4-trending-engine)
5. [Discover Routes](#5-discover-routes)
6. [Following Feed](#6-following-feed)
7. [Personalized Homepage](#7-personalized-homepage)
8. [Similar Artworks API](#8-similar-artworks-api)
9. [Unified Activity Feed](#9-unified-activity-feed)
10. [Meilisearch Configuration](#10-meilisearch-configuration)
11. [Caching Strategy](#11-caching-strategy)
12. [Scheduled Jobs](#12-scheduled-jobs)
13. [Testing](#13-testing)
14. [Operational Runbook](#14-operational-runbook)
---
## 1. Architecture Overview
```
Browser
├─ POST /api/art/{id}/view → ArtworkViewController
├─ POST /api/art/{id}/download → ArtworkDownloadController
└─ POST /api/artworks/{id}/favorite / reactions / awards / comments
ArtworkStatsService UserStatsService
artwork_stats (all-time + user_statistics
windowed counters) └─ artwork_views_received_count
artwork_downloads (log) downloads_received_count
skinbase:reset-windowed-stats (nightly/weekly)
└─ zeros views_24h / views_7d
└─ recomputes downloads_24h / downloads_7d from log
skinbase:recalculate-trending (every 30 min)
└─ bulk UPDATE artworks.trending_score_24h / _7d
└─ dispatches IndexArtworkJob → Meilisearch
Meilisearch index (artworks)
└─ sortable: trending_score_7d, trending_score_24h, views, ...
└─ filterable: author_id, tags, category, orientation, is_public, ...
HomepageService / DiscoverController / SimilarArtworksController
└─ Redis cache (5 min TTL)
Inertia + React frontend
```
---
## 2. Input Signal Collection
### 2.1 View tracking — `POST /api/art/{id}/view`
**Controller:** `App\Http\Controllers\Api\ArtworkViewController`
**Route name:** `api.art.view`
**Throttle:** 5 requests per 10 minutes per IP
**Deduplication (layered):**
| Layer | Mechanism | Scope |
|---|---|---|
| Client-side | `sessionStorage` key `sb_viewed_{id}` set before the request | Browser tab lifetime |
| Server-side | `$request->session()->put('art_viewed.{id}', true)` | Laravel session lifetime |
| Throttle | `throttle:5,10` route middleware | Per-IP per-artwork |
The React component `ArtworkActions.jsx` fires a `useEffect` on mount that checks `sessionStorage` first, then hits the endpoint. The response includes `counted: true|false` so callers can confirm whether the increment actually happened.
**What gets incremented:**
```
artwork_stats.views +1 (all-time)
artwork_stats.views_24h +1 (zeroed nightly)
artwork_stats.views_7d +1 (zeroed weekly)
user_statistics.artwork_views_received_count +1 (creator aggregate)
```
Via `ArtworkStatsService::incrementViews()` with `defer: true` (Redis when available, direct DB fallback).
---
### 2.2 Download tracking — `POST /api/art/{id}/download`
**Controller:** `App\Http\Controllers\Api\ArtworkDownloadController`
**Route name:** `api.art.download`
**Throttle:** 10 requests per minute per IP
The endpoint:
1. Inserts a row in `artwork_downloads` (persistent event log with `created_at`)
2. Increments `artwork_stats.downloads`, `downloads_24h`, `downloads_7d`
3. Returns `{"ok": true, "url": "<highest-res thumbnail URL>"}` for the native browser download
The `<a download>` buttons in `ArtworkActions.jsx` call `trackDownload()` on click — a fire-and-forget `fetch()` POST. The actual browser download is triggered by the `href`/`download` attributes and is never blocked by the tracking request.
**What gets incremented:**
```
artwork_downloads INSERT (event log, persisted forever)
artwork_stats.downloads +1 (all-time)
artwork_stats.downloads_24h +1 (recomputed from log nightly)
artwork_stats.downloads_7d +1 (recomputed from log weekly)
user_statistics.downloads_received_count +1 (creator aggregate)
```
Via `ArtworkStatsService::incrementDownloads()` with `defer: true`.
---
### 2.3 Other signals (already existed)
| Signal | Endpoint / Service | Written to |
|---|---|---|
| Favorite toggle | `POST /api/artworks/{id}/favorite` | `user_favorites`, `artwork_stats.favorites` |
| Reaction toggle | `POST /api/artworks/{id}/reactions` | `artwork_reactions` |
| Award | `ArtworkAwardController` | `artwork_award_stats.score_total` |
| Comment | `ArtworkCommentController` | `artwork_comments`, `activity_events` |
| Follow | `FollowService` | `user_followers`, `activity_events` |
---
### 2.4 ArtworkStatsService — Redis deferral
When Redis is available all increments are pushed to a list key `artwork_stats:deltas` as JSON payloads. A separate job/command (`processPendingFromRedis`) drains the queue and applies bulk `applyDelta()` calls. If Redis is unavailable the service falls back transparently to a direct DB increment.
```php
// Deferred (default for view/download controllers)
$svc->incrementViews($artworkId, 1, defer: true);
// Immediate (e.g. favorites toggle needs instant feedback)
$svc->incrementDownloads($artworkId, 1, defer: false);
```
---
## 3. Windowed Stats (views & downloads)
### 3.1 Why windowed columns?
The trending formula needs _recent_ activity, not all-time totals. `artwork_stats.views` is a monotonically increasing counter — using it for trending would permanently favour old popular artworks and new artworks could never compete.
The solution is four cached window columns refreshed on a schedule:
| Column | Meaning | Reset cadence |
|---|---|---|
| `views_24h` | Views since last midnight reset | Nightly at 03:30 |
| `views_7d` | Views since last Monday reset | Weekly (Mon) at 03:30 |
| `downloads_24h` | Downloads in last 24 h | Nightly at 03:30 (recomputed from log) |
| `downloads_7d` | Downloads in last 7 days | Weekly (Mon) at 03:30 (recomputed from log) |
### 3.2 How views windowing works
**No per-view event log exists** (storing millions of view rows would be expensive). Instead:
- Every view event increments `views_24h` and `views_7d` alongside `views`.
- The reset command **zeroes** both columns. Artworks re-accumulate from the reset time onward.
- Accuracy is "views since last reset", which is close enough for trending (error ≤ 1 day).
### 3.3 How downloads windowing works
**`artwork_downloads` is a full event log** with `created_at`. The reset command:
1. Queries `COUNT(*) FROM artwork_downloads WHERE artwork_id = ? AND created_at >= NOW() - {interval}` for each artwork in chunks of 1000.
2. Writes the exact count back to `downloads_24h` / `downloads_7d`.
This overwrites any drift from deferred Redis increments, making download windows always accurate at reset time.
### 3.4 Reset command
```bash
php artisan skinbase:reset-windowed-stats --period=24h
php artisan skinbase:reset-windowed-stats --period=7d
```
Uses chunked PHP loop (no `GREATEST()` / `INTERVAL` MySQL syntax) → works in both production MySQL and SQLite test DB.
---
## 4. Trending Engine
### 4.1 Formula
```
score = (award_score × 5.0)
+ (favorites × 3.0)
+ (reactions × 2.0)
+ (downloads_Xd × 1.0) ← windowed: 24h or 7d
+ (views_Xd × 2.0) ← windowed: 24h or 7d
- (hours_since_published × 0.1)
score = max(score, 0) ← clamped via GREATEST()
```
Weights are constants in `TrendingService` (`W_AWARD`, `W_FAVORITE`, etc.) — adjust without a schema change.
### 4.2 Output columns
| Artworks column | Meaning |
|---|---|
| `trending_score_24h` | Score using `views_24h` + `downloads_24h`; targets artworks ≤ 7 days old |
| `trending_score_7d` | Score using `views_7d` + `downloads_7d`; targets artworks ≤ 30 days old |
| `last_trending_calculated_at` | Timestamp of last calculation |
### 4.3 Recalculation command
```bash
php artisan skinbase:recalculate-trending --period=24h
php artisan skinbase:recalculate-trending --period=7d
php artisan skinbase:recalculate-trending --period=all
php artisan skinbase:recalculate-trending --period=7d --skip-index # skip Meilisearch jobs
php artisan skinbase:recalculate-trending --chunk=500 # smaller DB chunks
```
**Implementation:** `App\Services\TrendingService::recalculate()`
1. Chunks artworks published within the look-back window (`chunkById(1000, ...)`).
2. Issues one bulk MySQL `UPDATE ... WHERE id IN (...)` per chunk — no per-artwork queries in the hot path.
3. After each chunk, dispatches `IndexArtworkJob` per artwork to push updated scores to Meilisearch (skippable with `--skip-index`).
> **Note:** The raw SQL uses `GREATEST()` and `TIMESTAMPDIFF(HOUR, ...)` which are MySQL 8 only. The command is tested in production against MySQL; the 4 related Pest tests are skipped on SQLite with a clear skip message.
### 4.4 Meilisearch sync after calculation
`TrendingService::syncToSearchIndex()` dispatches `IndexArtworkJob` for every artwork in the trending window. The job calls `Artwork::searchable()` which triggers `toSearchableArray()`, which includes `trending_score_24h` and `trending_score_7d`.
---
## 5. Discover Routes
All routes under `/discover/*` are registered in `routes/web.php` and handled by `App\Http\Controllers\Web\DiscoverController`. All use **Meilisearch sorting** — no SQL `ORDER BY` in the hot path.
| Route | Name | Sort key | Auth |
|---|---|---|---|
| `/discover/trending` | `discover.trending` | `trending_score_7d:desc` | No |
| `/discover/fresh` | `discover.fresh` | `created_at:desc` | No |
| `/discover/top-rated` | `discover.top-rated` | `likes:desc` | No |
| `/discover/most-downloaded` | `discover.most-downloaded` | `downloads:desc` | No |
| `/discover/following` | `discover.following` | `created_at:desc` (DB) | Yes |
---
## 6. Following Feed
**Route:** `GET /discover/following` (auth required)
**Controller:** `DiscoverController::following()`
### Logic
```
1. Get user's following IDs from user_followers
2. If empty → show empty state (see below)
3. If present → Artwork::whereIn('user_id', $followingIds)
->orderByDesc('published_at')
->paginate(24)
+ cached 1 min per user per page
```
### Empty state
When the user follows nobody:
- `fallback_trending` — up to 12 trending artworks (Meilisearch, with DB fallback)
- `fallback_creators` — 8 most-followed verified users (ordered by `user_statistics.followers_count`)
- `empty: true` flag passed to the view
- The `discoverTrending()` call is wrapped in `try/catch` so a Meilisearch outage never breaks the empty state page
---
## 7. Personalized Homepage
**Controller:** `HomeController::index()`
**Service:** `App\Services\HomepageService`
### Guest sections
```php
[
'hero' => first featured artwork,
'trending' => 12 artworks sorted by trending_score_7d,
'fresh' => 12 newest artworks,
'tags' => 12 most-used tags,
'creators' => creator spotlight,
'news' => latest news posts,
]
```
### Authenticated sections (personalized)
```php
[
'hero' => same as guest,
'from_following' => artworks from followed creators (up to 12, cached 1 min),
'trending' => same as guest,
'by_tags' => artworks matching user's top 5 tags,
'by_categories' => fresh uploads in user's top 3 favourite categories,
'tags' => same as guest,
'creators' => same as guest,
'news' => same as guest,
'preferences' => { top_tags, top_categories },
]
```
### UserPreferenceService
`App\Services\UserPreferenceService::build(User $user)` — cached 5 min per user.
Computes preferences from the user's **favourited artworks**:
| Output key | Source |
|---|---|
| `top_tags` (up to 5) | Tags on artworks in `artwork_favourites` |
| `top_categories` (up to 3) | Categories on artworks in `artwork_favourites` |
| `followed_creators` | IDs from `user_followers` |
### getTrending() — Meilisearch-first
```php
Artwork::search('')
->options([
'filter' => 'is_public = true AND is_approved = true',
'sort' => ['trending_score_7d:desc', 'trending_score_24h:desc', 'views:desc'],
])
->paginate($limit, 'page', 1);
```
Falls back to `getTrendingFromDb()``orderByDesc('trending_score_7d')` with no correlated subqueries — when Meilisearch is unavailable.
---
## 8. Similar Artworks API
**Route:** `GET /api/art/{id}/similar`
**Controller:** `App\Http\Controllers\Api\SimilarArtworksController`
**Route name:** `api.art.similar`
**Throttle:** 60/min
**Cache:** 5 min per artwork ID
**Max results:** 12
### Similarity algorithm
Meilisearch filters are built in priority order:
```
is_public = true
is_approved = true
id != {source_id}
author_id != {source_author_id} ← same creator excluded
orientation = "{landscape|portrait}" ← only for non-square (visual coherence)
(tags = "X" OR tags = "Y" OR ...) ← tag overlap (primary signal)
OR (if no tags)
(category = "X" OR ...) ← category fallback
```
Meilisearch's own ranking then sorts by relevance within those filters. Results are mapped to a slim JSON shape: `{id, title, slug, thumb, url, author_id}`.
---
## 9. Unified Activity Feed
**Route:** `GET /community/activity?type=global|following`
**Controller:** `App\Http\Controllers\Web\CommunityActivityController`
### `activity_events` schema
| Column | Type | Notes |
|---|---|---|
| `id` | bigint PK | |
| `actor_id` | bigint FK users | Who did the action |
| `type` | varchar | `upload` `comment` `favorite` `award` `follow` |
| `target_type` | varchar | `artwork` `user` |
| `target_id` | bigint | ID of the target object |
| `meta` | json nullable | Extra data (e.g. award tier) |
| `created_at` | timestamp | No `updated_at` — immutable events |
### Where events are recorded
| Event type | Recording point |
|---|---|
| `upload` | `UploadController::finish()` on publish |
| `follow` | `FollowService::follow()` |
| `award` | `ArtworkAwardController::store()` |
| `favorite` | `ArtworkInteractionController::favorite()` |
| `comment` | `ArtworkCommentController::store()` |
All via `ActivityEvent::record($actorId, $type, $targetType, $targetId, $meta)`.
### Feed filters
- **Global** — all recent events, newest first, paginated 30/page
- **Following** — `WHERE actor_id IN (following_ids)` — only events from users you follow
The controller enriches each event batch with its target objects in a single query per target type (no N+1).
---
## 10. Meilisearch Configuration
Configured in `config/scout.php` under `meilisearch.index-settings`.
Push settings to a running instance:
```bash
php artisan scout:sync-index-settings
```
### Artworks index settings
**Searchable attributes** (ranked in order):
1. `title`
2. `tags`
3. `author_name`
4. `description`
**Filterable attributes:**
`tags`, `category`, `content_type`, `orientation`, `resolution`, `author_id`, `is_public`, `is_approved`
**Sortable attributes:**
`created_at`, `downloads`, `likes`, `views`, `trending_score_24h`, `trending_score_7d`, `favorites_count`, `awards_received_count`, `downloads_count`
### toSearchableArray() — fields indexed per artwork
```php
[
'id', 'slug', 'title', 'description',
'author_id', 'author_name',
'category', 'content_type', 'tags',
'resolution', 'orientation',
'downloads', 'likes', 'views',
'created_at', 'is_public', 'is_approved',
'trending_score_24h', 'trending_score_7d',
'favorites_count', 'awards_received_count', 'downloads_count',
'awards' => { gold, silver, bronze, score },
]
```
---
## 11. Caching Strategy
| Data | Cache key | TTL | Driver |
|---|---|---|---|
| Homepage trending | `homepage.trending.{limit}` | 5 min | Redis/file |
| Homepage fresh | `homepage.fresh.{limit}` | 5 min | Redis/file |
| Homepage hero | `homepage.hero` | 5 min | Redis/file |
| Homepage tags | `homepage.tags.{limit}` | 5 min | Redis/file |
| User preferences | `user.prefs.{user_id}` | 5 min | Redis/file |
| Following feed | `discover.following.{user_id}.p{page}` | 1 min | Redis/file |
| Similar artworks | `api.similar.{artwork_id}` | 5 min | Redis/file |
**Rules:**
- Personalized data (`from_following`, `by_tags`, `by_categories`) is **not** independently cached — it falls inside `allForUser()` which is called fresh per request.
- Long-running cache busting: the trending command and reset command do not explicitly clear cache — the TTL is short enough that stale data self-expires within one trending cycle.
---
## 12. Scheduled Jobs
All registered in `routes/console.php` via `Schedule::command()`.
| Time | Command | Purpose |
|---|---|---|
| Every 30 min | `skinbase:recalculate-trending --period=24h` | Update `trending_score_24h` |
| Every 30 min | `skinbase:recalculate-trending --period=7d --skip-index` | Update `trending_score_7d` (background) |
| 03:00 daily | `uploads:cleanup` | Remove stale draft uploads |
| 03:10 daily | `analytics:aggregate-similar-artworks` | Offline similarity metrics |
| 03:20 daily | `analytics:aggregate-feed` | Feed evaluation metrics |
| 03:30 daily | `skinbase:reset-windowed-stats --period=24h` | Zero views_24h, recompute downloads_24h |
| Monday 03:30 | `skinbase:reset-windowed-stats --period=7d` | Zero views_7d, recompute downloads_7d |
**Reset runs at 03:30** so it fires after the other maintenance tasks (03:0003:20). The next trending recalculation (every 30 min, including ~03:30 or ~04:00) picks up the freshly-zeroed windowed stats and writes accurate trending scores.
---
## 13. Testing
All tests live under `tests/Feature/Discovery/`.
| Test file | Coverage |
|---|---|
| `ActivityEventRecordingTest.php` | `ActivityEvent::record()`, all 5 types, actor relation, meta, route smoke tests for the activity feed |
| `FollowingFeedTest.php` | Auth redirect, empty state fallback, pagination, creator exclusion |
| `HomepagePersonalizationTest.php` | Guest vs auth homepage sections, preferences shape, 200 responses |
| `SimilarArtworksApiTest.php` | 404 cases, response shape, result count ≤ 12, creator exclusion |
| `SignalTrackingTest.php` | View endpoint (404s, first count, session dedup), download endpoint (404s, DB row, guest vs auth), route names |
| `TrendingServiceTest.php` | Zero artworks, skip outside window, skip private/unapproved — _recalculate() tests skipped on SQLite (MySQL-only SQL)_ |
| `WindowedStatsTest.php` | `incrementViews/Downloads` update all 3 columns, reset command zeros views, recomputes downloads from log, window boundary correctness |
Run all discovery tests:
```bash
php artisan test tests/Feature/Discovery/
```
Run specific suite:
```bash
php artisan test tests/Feature/Discovery/SignalTrackingTest.php
```
**SQLite vs MySQL note:** Four tests in `TrendingServiceTest` are marked `.skip()` with the message _"Requires MySQL: uses GREATEST() and TIMESTAMPDIFF()"_. Run them against a real MySQL instance in CI or staging to validate the bulk UPDATE formula.
---
## 14. Operational Runbook
### Trending scores are stuck / not updating
```bash
# Check last calculated timestamp
SELECT id, title, last_trending_calculated_at FROM artworks ORDER BY last_trending_calculated_at DESC LIMIT 5;
# Manually trigger recalculation
php artisan skinbase:recalculate-trending --period=all
# Re-push scores to Meilisearch
php artisan skinbase:recalculate-trending --period=7d
```
### Windowed counters look wrong after a deploy
```bash
# Force a reset and recompute
php artisan skinbase:reset-windowed-stats --period=24h
php artisan skinbase:reset-windowed-stats --period=7d
# Then recalculate trending with fresh numbers
php artisan skinbase:recalculate-trending --period=all
```
### Meilisearch out of sync with DB
```bash
# Re-push all artworks in the trending window
php artisan skinbase:recalculate-trending --period=all
# Or full re-index
php artisan scout:import "App\Models\Artwork"
```
### Push updated index settings (after changing config/scout.php)
```bash
php artisan scout:sync-index-settings
```
### Check what the trending formula is reading
```sql
SELECT
a.id,
a.title,
a.published_at,
s.views,
s.views_24h,
s.views_7d,
s.downloads,
s.downloads_24h,
s.downloads_7d,
s.favorites,
a.trending_score_24h,
a.trending_score_7d,
a.last_trending_calculated_at
FROM artworks a
LEFT JOIN artwork_stats s ON s.artwork_id = a.id
WHERE a.is_public = 1 AND a.is_approved = 1
ORDER BY a.trending_score_7d DESC
LIMIT 20;
```
### Inspect the artwork_downloads log
```sql
-- Downloads in the last 24 hours per artwork
SELECT artwork_id, COUNT(*) as dl_24h
FROM artwork_downloads
WHERE created_at >= NOW() - INTERVAL 1 DAY
GROUP BY artwork_id
ORDER BY dl_24h DESC
LIMIT 20;
```