Files
SkinbaseNova/docs/slow-query-optimization-plan.md

474 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MySQL Slow Query Optimization Plan
**Source:** `/var/log/mysql/slow.log` — 68,950 total queries, 151,576s total exec time
**Period:** 2026-04-04 → 2026-04-26 (22 days)
**Threshold:** 500ms
**Server:** server3 / database: `skinbase`
---
## Summary Stats
| Metric | Value |
|---|---|
| Total queries logged | 68,950 |
| Unique query fingerprints | 139 |
| Total execution time | 151,576s (~42h) |
| Average exec time | 2s |
| 95th percentile | 3s |
| Total rows examined | 15.39B |
| Total bytes sent | 40.79GB |
---
## Priority 1 — Critical (fix immediately)
### P1-A: Correlated subquery counting artworks per tag (Query 8)
**Total time:** 1,004s · **Calls:** 1,138 · **Rows examined/call:** ~240k
**Current query:**
```sql
SELECT tags.*,
(SELECT count(*) FROM artworks
INNER JOIN artwork_tag ON artworks.id = artwork_tag.artwork_id
WHERE tags.id = artwork_tag.tag_id AND artworks.deleted_at IS NULL)
AS artworks_count
FROM tags
ORDER BY artworks_count DESC
LIMIT 10
```
**Problem:** N+1 correlated subquery — one full `artworks JOIN artwork_tag` count per tag row.
**Fix options (pick one):**
1. **Best — cached counter column:** Add `artworks_count INT DEFAULT 0` to `tags`, maintained by an Eloquent observer on `artwork_tag` attach/detach. Query becomes `SELECT * FROM tags ORDER BY artworks_count DESC LIMIT 10` — instant.
2. **Quick — JOIN + GROUP BY:**
```sql
SELECT t.*, COALESCE(cnt.c, 0) AS artworks_count
FROM tags t
LEFT JOIN (
SELECT at.tag_id, COUNT(*) AS c
FROM artwork_tag at
JOIN artworks a ON a.id = at.artwork_id AND a.deleted_at IS NULL
GROUP BY at.tag_id
) cnt ON cnt.tag_id = t.id
ORDER BY artworks_count DESC
LIMIT 10;
```
**Migration needed:** `php artisan make:migration add_artworks_count_to_tags`
---
### P1-B: Correlated subquery counting artworks per user (Query 33)
**Total time:** 85s · **Calls:** 131 · **Rows examined/call:** ~61k
**Current query:**
```sql
SELECT users.*,
(SELECT count(*) FROM artworks
WHERE users.id = artworks.user_id AND is_approved=1 AND is_public=1 ...)
AS artworks_count
FROM users
HAVING artworks_count > 0
ORDER BY artworks_count DESC
LIMIT 6
```
**Problem:** Same correlated N+1 pattern per user. `HAVING` on a subquery forces a full users scan.
**Fix:** Use the existing `user_statistics` table. Add `public_artworks_count INT DEFAULT 0` if not present, maintained by artwork publish/unpublish observer. Then:
```sql
SELECT users.*, us.public_artworks_count AS artworks_count
FROM users
JOIN user_statistics us ON us.user_id = users.id
WHERE us.public_artworks_count > 0 AND users.deleted_at IS NULL
ORDER BY us.public_artworks_count DESC
LIMIT 6;
```
---
### P1-C: Jobs table LIKE scan on JSON payload (Query 38)
**Total time:** 64s · **Calls:** 58 · **Rows examined/call:** ~55k
**Current query:**
```sql
SELECT count(*) FROM jobs
WHERE payload LIKE '%AutoTagArtworkJob%' AND payload LIKE '%69756%'
```
**Problem:** Full scan of the `jobs` table JSON payload column — no index possible on LIKE '%...%'.
**Fix:** Replace this deduplication check with a Redis key or a dedicated `job_dedup` table with an indexed `(job_class, subject_id)` column. Example:
```php
// Instead of scanning jobs table:
if (Cache::has("auto-tag-queued:{$artworkId}")) return;
Cache::put("auto-tag-queued:{$artworkId}", true, now()->addHours(1));
AutoTagArtworkJob::dispatch($artwork);
```
---
### P1-D: `SELECT DISTINCT artwork_id` from snapshots with no index (Query 12)
**Total time:** 923s · **Calls:** 280 · **Rows examined/call:** ~3.97M (max 8.3M!)
**Current query:**
```sql
SELECT DISTINCT artwork_id
FROM artwork_metric_snapshots_hourly
WHERE bucket_hour BETWEEN '...' AND '...'
```
**Problem:** No covering index on `(bucket_hour, artwork_id)`. Full or large range scan every time.
**Fix:** Add a compound index:
```sql
ALTER TABLE artwork_metric_snapshots_hourly
ADD INDEX idx_bucket_artwork (bucket_hour, artwork_id);
```
Migration: `php artisan make:migration add_index_bucket_artwork_to_metric_snapshots`
---
### P1-E: Full-text LIKE searches on artworks title + description (Queries 50, 53)
**Total time:** ~62s combined · **Calls:** ~48 · Pattern: `WHERE title LIKE '%keyword%' OR description LIKE '%keyword%' OR ...` (20+ OR conditions)
**Problem:** Leading wildcard LIKE cannot use B-tree indexes. Full table scan every time.
**Fix:** Use **Meilisearch** (already in use for artwork search). Route AI-tag search queries through `ArtworkSearchService` instead of raw LIKE. For any fallback that must stay in MySQL:
```sql
-- Add FULLTEXT index:
ALTER TABLE artworks ADD FULLTEXT INDEX ft_artwork_text (title, description);
-- Then use MATCH..AGAINST instead of LIKE:
WHERE MATCH(title, description) AGAINST ('+moon +lunar' IN BOOLEAN MODE)
```
---
## Priority 2 — High Impact
### P2-A: Artwork aggregate stats queries — top 2 time consumers (Queries 1 & 2)
**Total time:** 83,766s + 34,068s = **117,834s** (78% of all slow query time)
**Calls:** 37,124 + 13,789 = ~51k · **Avg:** 2.2s · **Rows examined/call:** ~215234k
These are the same heavy SELECT pattern loading per-artwork stats from multiple tables:
```sql
SELECT a.id, a.user_id, a.published_at, a.is_public, a.is_approved,
(a.thumb_ext IS NOT NULL AND a.thumb_ext != '') AS has_thumbnail,
COALESCE(ast.views, 0) AS views_all,
COALESCE(ast.downloads, 0) AS downloads_all,
COALESCE(ast.favorites, 0) AS favourites_all,
COALESCE(cc.cnt, 0) AS comments_count,
COALESCE(sc.cnt, 0) AS shares_count,
COALESCE(ast.views_7d, 0) AS views_7d,
...
FROM artworks a
LEFT JOIN artwork_stats ast ON ast.artwork_id = a.id
LEFT JOIN (SELECT artwork_id, COUNT(*) cnt FROM artwork_favourites WHERE created_at >= ...) fav7 ...
LEFT JOIN (SELECT artwork_id, COUNT(*) cnt FROM artwork_comments ...) cc ...
LEFT JOIN (SELECT artwork_id, COUNT(*) cnt FROM artwork_shares ...) sc ...
WHERE a.is_public = 1 AND a.is_approved = 1 AND a.deleted_at IS NULL
AND a.published_at <= NOW()
ORDER BY a.id ASC
LIMIT 500
```
**Problems:**
- Inline derived subqueries for `fav7`, `cc`, `sc` run per page — not cached.
- `ORDER BY a.id` with filters requires index on `(deleted_at, is_public, is_approved, published_at, id)`.
- 51k calls/22 days = ~2,300 calls/day = every ~37 seconds, all day long. This is a **scheduled job or background process** churning through artworks.
**Fix — composite index (immediate):**
```sql
ALTER TABLE artworks
ADD INDEX idx_public_approved_published (deleted_at, is_public, is_approved, published_at, id);
```
**Fix — pre-aggregate counts (medium term):**
Ensure `artwork_stats` already stores `favorites_7d`, `comments_count`, `shares_count`. If so, move all counts to `artwork_stats` maintenance jobs and remove the inline derived joins. The SELECT becomes a single fast `LEFT JOIN artwork_stats`.
**Fix — reduce call frequency:**
If this is a scheduler-driven scan, batch it into chunks with exponential backoff and persist cursor position so it doesn't re-scan from scratch every run.
---
### P2-B: User stats calculation — 28 second average (Query 3)
**Total time:** 8,847s · **Calls:** 320 · **Avg:** 28s · **Max:** 180s!
Complex query joining `users`, `user_xp_logs`, `user_followers`, `artwork_likes`, `artworks`, `artwork_metric_snapshots_hourly`.
**Fix:** This should **never run on demand**. Route it to:
1. A scheduled background job that pre-aggregates into `user_statistics` (runs every N minutes).
2. The controller/service reads from `user_statistics` only — single-row lookup by `user_id`.
Also ensure `user_statistics` has `INDEX idx_user_id (user_id)`.
---
### P2-C: `artwork_metric_snapshots_hourly` heavy join query (Query 4)
**Total time:** 6,328s · **Calls:** 237 · **Avg:** 27s · **Max:** 52s
```sql
SELECT artworks.*, ...
FROM artworks
JOIN artwork_metric_snapshots_hourly amsh ON amsh.artwork_id = artworks.id
JOIN artwork_likes al ON ...
JOIN artwork_downloads ad ON ...
JOIN artwork_comments ac ON ...
WHERE amsh.bucket_hour BETWEEN ... AND ...
```
**Fix:**
1. Add index from P1-D: `idx_bucket_artwork (bucket_hour, artwork_id)`.
2. Pre-aggregate hourly snapshots into daily/weekly summary tables and query those instead.
3. Reduce time range of `BETWEEN` clause if querying recent data only.
---
### P2-D: rank_artwork_scores queries — ~10 variants (Queries 26, 27, 29, 37, 39, 41, 42, 43, 46)
**Total time:** ~750s combined · ~870 combined calls · **Avg:** ~750ms · **Rows examined/call:** ~140150k
Pattern:
```sql
SELECT ras.artwork_id, a.user_id, ras.score_trending
FROM rank_artwork_scores ras
INNER JOIN artworks a ON a.id = ras.artwork_id AND a.is_public=1 AND a.is_approved=1 AND a.deleted_at IS NULL
WHERE ras.model_version = 'rank_v2'
ORDER BY ras.score_trending DESC
LIMIT 200
```
**Problem:** `WHERE model_version = 'rank_v2' ORDER BY score_X DESC` — no composite index covers both.
**Fix — add partial/composite indexes:**
```sql
ALTER TABLE rank_artwork_scores
ADD INDEX idx_mv_trending (model_version, score_trending DESC),
ADD INDEX idx_mv_new_hot (model_version, score_new_hot DESC),
ADD INDEX idx_mv_best (model_version, score_best DESC),
ADD INDEX idx_mv_score (model_version, score_new_hot, score_trending, score_best);
```
**Fix — cache ranking results:**
These are pre-computed ranking scores. Cache the TOP 200 list per `(model_version, score_column)` for 515 minutes in Redis. The ranking job already runs on a schedule — warm the cache at the end of each ranking job run.
---
### P2-E: `artworks` public count via full scan (Query 30)
**Total time:** 100s · **Calls:** 127 · **Avg:** 790ms · **Rows examined/call:** 97k
```sql
SELECT count(*) FROM artworks
WHERE deleted_at IS NULL AND is_approved=1 AND is_public=1
AND published_at IS NOT NULL AND published_at <= NOW()
```
**Fix — maintain a counter cache:**
```php
// In config or cache:
Cache::remember('artworks.public_count', 300, fn() => Artwork::public()->published()->count());
```
Or store the count in a `site_statistics` / `system_settings` table, updated by the publish observer.
---
### P2-F: artworks ORDER BY id scanner for publish pipeline (Queries 18, 22, 23, 40, 47)
**Total time:** ~650s combined · Many calls · Pattern: `SELECT * FROM artworks WHERE ... ORDER BY id ASC` or `ORDER BY trending_score_7d DESC`
**Problem:** `SELECT *` loads all columns including large `description` blobs. The publish/ranking pipeline only needs IDs.
**Fix:**
- Use `SELECT id` or `SELECT id, user_id` instead of `SELECT *`.
- Ensure `trending_score_7d` is indexed if using `ORDER BY trending_score_7d`.
- Add index from P2-A: `idx_public_approved_published`.
---
## Priority 3 — Medium Impact
### P3-A: Popular tags query with tag_interaction_daily_metrics (Query 6)
**Total time:** 2,234s · **Calls:** 2,349 · **Avg:** 951ms
Joining `artworks → artwork_tag → tags → tag_interaction_daily_metrics`. 246k rows examined/call.
**Fix:**
- Cache the result: popular tags change slowly — cache 515 minutes.
- Add index on `tag_interaction_daily_metrics (tag_id, metric_date)`.
- Precompute `tag_interaction_daily_metrics` aggregates into a `tag_trending_scores` table.
---
### P3-B: Browse/gallery category + tag joins (Query 7)
**Total time:** 1,758s · **Calls:** 2,344 · **Avg:** 750ms
`artworks + categories + artwork_category + artwork_tag` — 80M total rows examined.
**Fix:**
- Verify indexes: `artwork_category(category_id, artwork_id)`, `artwork_tag(tag_id, artwork_id)`.
- Pagination: ensure cursor/keyset pagination is used, not `OFFSET`.
- Cache browse results per category (already partially done in HomepageService).
---
### P3-C: artwork_metric_snapshots_hourly backup full scan (Query 15)
**Total time:** 646s · **Calls:** 93 · **Max:** 127s · **User:** `backuper`
`SELECT /*!40001 SQL_NO_CACHE */ * FROM artwork_metric_snapshots_hourly` — mysqldump reading full table.
**Fix (ops):**
- Partition `artwork_metric_snapshots_hourly` by month on `bucket_hour`. Backup only reads the active partition.
- Or: exclude this table from hot backup and back it up separately during low-traffic window (02:0004:00 UTC).
- Archive data older than 90 days to a cold table.
---
### P3-D: `artworks LEFT JOIN artwork_stats` with OR condition (Query 32)
**Total time:** 86s · **Calls:** 110
```sql
WHERE (artworks.created_at >= '...' OR (s.ranking_score IS NOT NULL AND s.ranking_score > 0))
```
**Problem:** OR prevents index usage on `created_at`.
**Fix:** Rewrite as UNION:
```sql
SELECT id FROM artworks WHERE created_at >= '...' AND deleted_at IS NULL AND is_approved=1
UNION
SELECT a.id FROM artworks a JOIN artwork_stats s ON s.artwork_id = a.id
WHERE s.ranking_score > 0 AND a.deleted_at IS NULL AND a.is_approved=1
```
---
### P3-E: GROUP BY user_id from artworks (Query 31)
**Total time:** 92s · **Calls:** 110
```sql
SELECT a.user_id, COALESCE(us.followers_count, 0), COALESCE(us.favorites_received_count, 0)
FROM artworks a
LEFT JOIN user_statistics us ON us.user_id = a.user_id
WHERE a.is_public=1 AND a.is_approved=1 AND a.deleted_at IS NULL
GROUP BY a.user_id, us.followers_count, us.favorites_received_count
```
**Problem:** Full artworks scan + GROUP BY without a covering index.
**Fix:** Add index `(deleted_at, is_public, is_approved, user_id)` on artworks. Or pre-aggregate into `user_statistics` and query that directly without touching `artworks`.
---
### P3-F: Forum posts moderation scan (Query 62)
**Total time:** 12s · **Calls:** 15
```sql
SELECT * FROM forum_posts
WHERE (moderation_checked = 0 OR last_ai_scan_at IS NULL OR updated_at > last_ai_scan_at)
AND id IS NOT NULL AND deleted_at IS NULL
ORDER BY id ASC LIMIT 50
```
**Fix:** Add a partial index for unmoderated posts:
```sql
ALTER TABLE forum_posts ADD INDEX idx_needs_moderation (moderation_checked, last_ai_scan_at, id)
WHERE moderation_checked = 0;
```
Or maintain a `moderation_queue` table with only pending IDs.
---
## Recommended Index Migrations
Apply these in order (fastest wins first):
```bash
php artisan make:migration add_performance_indexes_batch1
```
```php
public function up(): void
{
// P1-D: snapshot bucket+artwork lookup
Schema::table('artwork_metric_snapshots_hourly', function (Blueprint $table) {
$table->index(['bucket_hour', 'artwork_id'], 'idx_bucket_artwork');
});
// P2-A + P2-F: public artwork scans & ORDER BY id
Schema::table('artworks', function (Blueprint $table) {
$table->index(
['deleted_at', 'is_public', 'is_approved', 'published_at', 'id'],
'idx_public_approved_published_id'
);
$table->index(
['deleted_at', 'is_public', 'is_approved', 'user_id'],
'idx_public_approved_user'
);
});
// P2-D: rank_artwork_scores per model_version
Schema::table('rank_artwork_scores', function (Blueprint $table) {
$table->index(['model_version', 'score_trending'], 'idx_mv_trending');
$table->index(['model_version', 'score_new_hot'], 'idx_mv_new_hot');
$table->index(['model_version', 'score_best'], 'idx_mv_best');
$table->index(['model_version', 'score_new_hot'], 'idx_mv_score');
});
// P3-A: tag daily metrics
Schema::table('tag_interaction_daily_metrics', function (Blueprint $table) {
$table->index(['tag_id', 'metric_date'], 'idx_tag_date');
});
// P1-A: tag artworks_count cached column
Schema::table('tags', function (Blueprint $table) {
$table->unsignedInteger('artworks_count')->default(0)->after('slug');
$table->index('artworks_count', 'idx_artworks_count');
});
}
```
---
## Caching Quick Wins (no schema change needed)
| Surface | Cache Key | TTL | Notes |
|---|---|---|---|
| Popular tags | `tags.popular.10` | 15 min | Currently: correlated subquery per request |
| Ranking top-200 lists | `rank.{model}.{score}.200` | 10 min | Warm at end of ranking job |
| Public artwork count | `artworks.public_count` | 5 min | Used in sitemaps, stats |
| User artworks_count | `user.{id}.artworks_count` | 5 min | Warm on publish/unpublish |
| Group leaderboard | `leaderboard.group.monthly.5` | 30 min | Already in leaderboards service |
---
## Implementation Roadmap
| Phase | Items | Effort | Expected gain |
|---|---|---|---|
| **Week 1** | P1-A correlated tag query, P1-C jobs LIKE, P1-D snapshot index, all P2-D rank indexes | LowMed | ~1520% reduction in slow queries |
| **Week 2** | P2-A artworks composite index, P1-B user artworks_count, caching quick wins table | Med | ~4050% reduction; kills #1 and #2 slow query families |
| **Week 3** | P2-B user stats background job, P2-C snapshot pre-aggregation, P1-E fulltext index | High | ~6575% reduction; kills 28s queries |
| **Month 2** | P2-F SELECT * → SELECT id, P3-C partition snapshots table, P3-D OR→UNION rewrite | High | Remaining tail |
---
## Monitoring After Changes
```bash
# Reset slow query log on server
mysql -e "FLUSH SLOW LOGS;"
# Re-run pt-query-digest after 1 week:
pt-query-digest /var/log/mysql/slow.log > /tmp/slow-report-after.txt
# Check query plan for top queries:
EXPLAIN SELECT ... \G
```
Key metrics to watch:
- Total slow query count per day (target: -50% in week 2)
- `Rows_examined` for artwork queries (target: <10k instead of 234k)
- MySQL CPU usage during ranking job windows