418 lines
14 KiB
Markdown
418 lines
14 KiB
Markdown
# Skinbase Vision Stack (CLIP + BLIP + YOLO + Qdrant + Card Renderer + Maturity + LLM) – Dockerized FastAPI
|
||
|
||
This repository provides internal AI services for image analysis, vector search, card rendering, moderation,
|
||
and text generation behind a single **Gateway API**.
|
||
|
||
## Services & Ports
|
||
|
||
- `gateway` (exposed): `https://vision.klevze.net`
|
||
- `clip`: internal only
|
||
- `blip`: internal only
|
||
- `yolo`: internal only
|
||
- `qdrant`: vector DB (port `6333` exposed for direct access)
|
||
- `qdrant-svc`: internal Qdrant API wrapper
|
||
- `card-renderer`: internal card rendering service
|
||
- `maturity`: internal NSFW/maturity classifier service
|
||
- `llm`: internal text-generation service using a thin FastAPI shim over `llama-server` (profile-based, internal only)
|
||
|
||
## Run
|
||
|
||
```bash
|
||
docker compose up -d --build
|
||
```
|
||
|
||
That starts the default vision stack only. The LLM service is disabled by default so operators are not forced to run Qwen3 on the same host.
|
||
|
||
To also start the local llama.cpp service:
|
||
|
||
```bash
|
||
docker compose --profile llm up -d --build
|
||
```
|
||
|
||
Before enabling the `llm` profile locally, place the GGUF model file described in [models/qwen3/README.md](models/qwen3/README.md) and set `LLM_ENABLED=true` in `.env`.
|
||
|
||
If you use BLIP, create a `.env` file first.
|
||
|
||
Required variables:
|
||
|
||
```bash
|
||
API_KEY=your_api_key_here
|
||
HUGGINGFACE_TOKEN=your_huggingface_token_here
|
||
```
|
||
|
||
`HUGGINGFACE_TOKEN` is required when the configured BLIP model is private, gated, or otherwise requires Hugging Face authentication.
|
||
|
||
Optional maturity configuration (override in `.env` if needed):
|
||
|
||
```bash
|
||
MATURITY_MODEL=Falconsai/nsfw_image_detection
|
||
MATURITY_THRESHOLD_MATURE=0.80
|
||
MATURITY_THRESHOLD_REVIEW=0.60
|
||
MATURITY_ENABLED=true
|
||
```
|
||
|
||
Optional LLM configuration:
|
||
|
||
```bash
|
||
LLM_ENABLED=false
|
||
LLM_URL=http://llm:8080
|
||
LLM_DEFAULT_MODEL=qwen3-1.7b-instruct-q4_k_m
|
||
LLM_TIMEOUT=120
|
||
LLM_MAX_TOKENS_DEFAULT=256
|
||
LLM_MAX_TOKENS_HARD_LIMIT=1024
|
||
LLM_MAX_REQUEST_BYTES=65536
|
||
|
||
# Local llm profile only
|
||
MODEL_PATH=/models/Qwen3-1.7B-Instruct-Q4_K_M.gguf
|
||
LLM_CONTEXT_SIZE=4096
|
||
LLM_THREADS=4
|
||
LLM_GPU_LAYERS=0
|
||
```
|
||
|
||
Recommended production topology for the LLM: keep the gateway on the current vision host and point `LLM_URL` at a separate private machine or VPN-reachable container host. Running the full vision stack and Qwen3 together on a small 4c/8GB VPS will usually degrade both.
|
||
|
||
Service startup now waits on container healthchecks, so first boot may take longer while models finish loading.
|
||
|
||
## Health
|
||
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/health
|
||
```
|
||
|
||
LLM-specific gateway health:
|
||
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/ai/health
|
||
```
|
||
|
||
## LLM Smoke Test
|
||
|
||
Use this checklist on a Docker-capable host after provisioning the GGUF file and setting `LLM_ENABLED=true`.
|
||
|
||
1. Start the gateway and local LLM profile.
|
||
|
||
```bash
|
||
docker compose --profile llm up -d --build gateway llm
|
||
```
|
||
|
||
2. Confirm the LLM container is running and healthy.
|
||
|
||
```bash
|
||
docker compose ps llm
|
||
docker compose logs --tail=100 llm
|
||
```
|
||
|
||
3. Check the internal LLM health contract.
|
||
|
||
```bash
|
||
curl http://127.0.0.1:8080/health
|
||
```
|
||
|
||
Expected fields: `status`, `model`, `context_size`, `threads`.
|
||
|
||
4. Check gateway health and LLM reachability.
|
||
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" http://127.0.0.1:8003/health
|
||
curl -H "X-API-Key: <your-api-key>" http://127.0.0.1:8003/ai/health
|
||
```
|
||
|
||
5. Verify model discovery through the gateway.
|
||
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" http://127.0.0.1:8003/v1/models
|
||
```
|
||
|
||
6. Run a short non-streaming chat completion.
|
||
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST http://127.0.0.1:8003/ai/chat \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"messages": [
|
||
{"role": "system", "content": "You are a concise assistant for Skinbase Nova."},
|
||
{"role": "user", "content": "Write one sentence about an artist who creates cinematic sci-fi wallpaper packs."}
|
||
],
|
||
"max_tokens": 80
|
||
}'
|
||
```
|
||
|
||
7. If anything fails, inspect the two relevant services first.
|
||
|
||
```bash
|
||
docker compose logs --tail=200 llm
|
||
docker compose logs --tail=200 gateway
|
||
```
|
||
|
||
## Universal analyze (ALL)
|
||
|
||
### With URL
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/all \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'
|
||
```
|
||
|
||
### With file upload (multipart)
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/all/file \
|
||
-F "file=@/path/to/image.webp" \
|
||
-F "limit=5"
|
||
```
|
||
|
||
## Individual services (via gateway)
|
||
|
||
### CLIP tags
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/clip -H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'
|
||
```
|
||
|
||
### CLIP tags (file)
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/clip/file \
|
||
-F "file=@/path/to/image.webp" \
|
||
-F "limit=5"
|
||
```
|
||
|
||
### BLIP caption
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/blip -H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","variants":3}'
|
||
```
|
||
|
||
### BLIP caption (file)
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/blip/file \
|
||
-F "file=@/path/to/image.webp" \
|
||
-F "variants=3" \
|
||
-F "max_length=60"
|
||
```
|
||
|
||
### YOLO detect
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/yolo -H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","conf":0.25}'
|
||
```
|
||
|
||
### YOLO detect (file)
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/yolo/file \
|
||
-F "file=@/path/to/image.webp" \
|
||
-F "conf=0.25"
|
||
```
|
||
|
||
## Maturity / NSFW analysis
|
||
|
||
Analyzes an image and returns a normalized maturity signal for Nova moderation workflows.
|
||
|
||
### Analyze by URL
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/maturity \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp"}'
|
||
```
|
||
|
||
### Analyze from file upload
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/maturity/file \
|
||
-F "file=@/path/to/image.webp"
|
||
```
|
||
|
||
Example response:
|
||
```json
|
||
{
|
||
"maturity_label": "mature",
|
||
"confidence": 0.94,
|
||
"score": 0.94,
|
||
"labels": ["nsfw"],
|
||
"model": "Falconsai/nsfw_image_detection",
|
||
"threshold_used": 0.80,
|
||
"analysis_time_ms": 183.0,
|
||
"source": "maturity-service",
|
||
"action_hint": "flag_high",
|
||
"advisory": "High-confidence mature content detected"
|
||
}
|
||
```
|
||
|
||
`action_hint` values: `safe`, `review`, `flag_high`. Nova should use these to decide blur/queue/flag behaviour.
|
||
|
||
## Vector DB (Qdrant) via gateway
|
||
|
||
Qdrant point IDs must be either:
|
||
|
||
- an unsigned integer
|
||
- a UUID string
|
||
|
||
If you send another string value, the wrapper may replace it with a generated UUID. In that case the original value is stored in the payload as `_original_id`.
|
||
|
||
You can fetch a stored point by its preserved original application ID:
|
||
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/points/by-original-id/img-001
|
||
```
|
||
|
||
### Store image embedding by URL
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/upsert \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","id":"550e8400-e29b-41d4-a716-446655440000","metadata":{"category":"wallpaper"}}'
|
||
```
|
||
|
||
### Store image embedding by file upload
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/upsert/file \
|
||
-F "file=@/path/to/image.webp" \
|
||
-F 'id=550e8400-e29b-41d4-a716-446655440001' \
|
||
-F 'metadata_json={"category":"photo"}'
|
||
```
|
||
|
||
### Search similar images by URL
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5,"filter_metadata":{"is_public":true}}'
|
||
```
|
||
|
||
Optional search parameters: `hnsw_ef` (int), `exact` (bool), `indexed_only` (bool), `score_threshold` (float), `filter_metadata` (object).
|
||
|
||
### Search similar images by file upload
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search/file \
|
||
-F "file=@/path/to/image.webp" \
|
||
-F "limit=5" \
|
||
-F 'filter_metadata_json={"is_public":true}'
|
||
```
|
||
|
||
### List collections
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections
|
||
```
|
||
|
||
### Get collection info
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images
|
||
```
|
||
|
||
### Full diagnostic inspect
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/inspect
|
||
```
|
||
|
||
Returns HNSW config, optimizer config, quantization, segment count, payload index coverage, and RAM estimate for every collection.
|
||
|
||
### Payload index management
|
||
```bash
|
||
# List indexes
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images/indexes
|
||
|
||
# Create a single index
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/indexes \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"field":"is_public","type":"bool"}'
|
||
|
||
# Ensure multiple indexes (idempotent)
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/ensure-indexes \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"fields":[{"field":"is_public","type":"bool"},{"field":"category_id","type":"integer"}]}'
|
||
```
|
||
|
||
Supported index types: `keyword`, `integer`, `float`, `bool`, `geo`, `datetime`, `text`, `uuid`.
|
||
|
||
### Collection configuration (HNSW / optimizer / quantization)
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/configure \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"hnsw_m":16,"hnsw_ef_construct":200,"indexing_threshold":20000,"quantization_type":"int8"}'
|
||
```
|
||
|
||
### Delete points
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/delete \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"ids":["550e8400-e29b-41d4-a716-446655440000","550e8400-e29b-41d4-a716-446655440001"]}'
|
||
```
|
||
|
||
If you let the wrapper generate a UUID, use the returned `id` value for later `get`, `search`, or `delete` operations.
|
||
|
||
## Card Renderer
|
||
|
||
### List available templates
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/cards/templates
|
||
```
|
||
|
||
### Render a card from a URL
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title","username":"@artist","template":"nova-artwork-v1"}'
|
||
```
|
||
|
||
Returns binary image bytes (WebP by default).
|
||
|
||
### Render a card from a file upload
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/file \
|
||
-F "file=@/path/to/image.webp" \
|
||
-F "title=Artwork Title" \
|
||
-F "username=@artist" \
|
||
-F "template=nova-artwork-v1"
|
||
```
|
||
|
||
### Get card layout metadata (no image rendered)
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/meta \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title"}'
|
||
```
|
||
|
||
## LLM / Chat Completions
|
||
|
||
The gateway exposes stable text-generation endpoints backed by the internal `llm` service. They reuse the existing `X-API-Key` protection and keep the LLM container internal-only.
|
||
|
||
### OpenAI-style chat endpoint
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/v1/chat/completions \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"messages": [
|
||
{"role": "system", "content": "You are a concise assistant for Skinbase Nova."},
|
||
{"role": "user", "content": "Write a short creator biography for an artist who just hit 10,000 followers."}
|
||
],
|
||
"temperature": 0.7,
|
||
"max_tokens": 220,
|
||
"stream": false
|
||
}'
|
||
```
|
||
|
||
### Project-friendly chat endpoint
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/ai/chat \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"messages": [
|
||
{"role": "system", "content": "You are a concise assistant for Skinbase Nova."},
|
||
{"role": "user", "content": "Suggest metadata tags for a cyberpunk wallpaper pack."}
|
||
],
|
||
"max_tokens": 180
|
||
}'
|
||
```
|
||
|
||
### List models
|
||
```bash
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/v1/models
|
||
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/ai/models
|
||
```
|
||
|
||
## Notes
|
||
|
||
- Models are loaded at service startup; initial container start can take 1–2 minutes as model weights are downloaded.
|
||
- Qdrant data is persisted in the project folder at `./data/qdrant`, so it survives container restarts and recreates.
|
||
- The local `llm` profile does **not** auto-download Qwen3 weights. Mount the GGUF file explicitly and let startup fail fast if it is missing.
|
||
- Remote image URLs are restricted to public `http`/`https` hosts. Localhost, private IP ranges, and non-image content types are rejected.
|
||
- The maturity service uses `Falconsai/nsfw_image_detection` (ViT-based). Thresholds are configurable via `.env`. The model handles photos and stylized digital art but should be calibrated against real Skinbase content before production use.
|
||
- For small VPS deployments, prefer `LLM_ENABLED=true` with `LLM_URL` pointing to a separate LLM host instead of running the `llm` profile on the same machine.
|
||
- For production: add auth, rate limits, and restrict gateway exposure (private network).
|
||
- GPU: you can add NVIDIA runtime later (compose profiles) if needed.
|