This repository provides internal AI services for image analysis, vector search, card rendering, moderation, and text generation behind a single Gateway API.

Services & Ports

gateway (exposed): https://vision.klevze.net
clip: internal only
blip: internal only
yolo: internal only
qdrant: vector DB (port 6333 exposed for direct access)
qdrant-svc: internal Qdrant API wrapper
card-renderer: internal card rendering service
maturity: internal NSFW/maturity classifier service
llm: internal text-generation service using a thin FastAPI shim over llama-server (profile-based, internal only)

Run

docker compose up -d --build

That starts the default vision stack only. The LLM service is disabled by default so operators are not forced to run Qwen3 on the same host.

To also start the local llama.cpp service:

docker compose --profile llm up -d --build

Before enabling the llm profile locally, place the GGUF model file described in models/qwen3/README.md and set LLM_ENABLED=true in .env.

If you use BLIP, create a .env file first.

Required variables:

API_KEY=your_api_key_here
HUGGINGFACE_TOKEN=your_huggingface_token_here

HUGGINGFACE_TOKEN is required when the configured BLIP model is private, gated, or otherwise requires Hugging Face authentication.

Optional maturity configuration (override in .env if needed):

MATURITY_MODEL=Falconsai/nsfw_image_detection
MATURITY_THRESHOLD_MATURE=0.80
MATURITY_THRESHOLD_REVIEW=0.60
MATURITY_ENABLED=true

Optional LLM configuration:

LLM_ENABLED=false
LLM_URL=http://llm:8080
LLM_DEFAULT_MODEL=qwen3-1.7b-instruct-q4_k_m
LLM_TIMEOUT=120
LLM_MAX_TOKENS_DEFAULT=256
LLM_MAX_TOKENS_HARD_LIMIT=1024
LLM_MAX_REQUEST_BYTES=65536

# Local llm profile only
MODEL_PATH=/models/Qwen3-1.7B-Instruct-Q4_K_M.gguf
LLM_CONTEXT_SIZE=4096
LLM_THREADS=4
LLM_GPU_LAYERS=0

Recommended production topology for the LLM: keep the gateway on the current vision host and point LLM_URL at a separate private machine or VPN-reachable container host. Running the full vision stack and Qwen3 together on a small 4c/8GB VPS will usually degrade both.

Service startup now waits on container healthchecks, so first boot may take longer while models finish loading.

Health

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/health

LLM-specific gateway health:

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/ai/health

LLM Smoke Test

Use this checklist on a Docker-capable host after provisioning the GGUF file and setting LLM_ENABLED=true.

Start the gateway and local LLM profile.

docker compose --profile llm up -d --build gateway llm

Confirm the LLM container is running and healthy.

docker compose ps llm
docker compose logs --tail=100 llm

Check the internal LLM health contract.

curl http://127.0.0.1:8080/health

Expected fields: status, model, context_size, threads.

Check gateway health and LLM reachability.

curl -H "X-API-Key: <your-api-key>" http://127.0.0.1:8003/health
curl -H "X-API-Key: <your-api-key>" http://127.0.0.1:8003/ai/health

Verify model discovery through the gateway.

curl -H "X-API-Key: <your-api-key>" http://127.0.0.1:8003/v1/models

Run a short non-streaming chat completion.

curl -H "X-API-Key: <your-api-key>" -X POST http://127.0.0.1:8003/ai/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a concise assistant for Skinbase Nova."},
      {"role": "user", "content": "Write one sentence about an artist who creates cinematic sci-fi wallpaper packs."}
    ],
    "max_tokens": 80
  }'

If anything fails, inspect the two relevant services first.

docker compose logs --tail=200 llm
docker compose logs --tail=200 gateway

Universal analyze (ALL)

With URL

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/all \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'

With file upload (multipart)

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/all/file \
  -F "file=@/path/to/image.webp" \
  -F "limit=5"

Individual services (via gateway)

CLIP tags

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/clip -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'

CLIP tags (file)

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/clip/file \
  -F "file=@/path/to/image.webp" \
  -F "limit=5"

BLIP caption

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/blip -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","variants":3}'

BLIP caption (file)

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/blip/file \
  -F "file=@/path/to/image.webp" \
  -F "variants=3" \
  -F "max_length=60"

YOLO detect

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/yolo -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","conf":0.25}'

YOLO detect (file)

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/yolo/file \
  -F "file=@/path/to/image.webp" \
  -F "conf=0.25"

Maturity / NSFW analysis

Analyzes an image and returns a normalized maturity signal for Nova moderation workflows.

Analyze by URL

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/maturity \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp"}'

Analyze from file upload

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/analyze/maturity/file \
  -F "file=@/path/to/image.webp"

Example response:

{
  "maturity_label": "mature",
  "confidence": 0.94,
  "score": 0.94,
  "labels": ["nsfw"],
  "model": "Falconsai/nsfw_image_detection",
  "threshold_used": 0.80,
  "analysis_time_ms": 183.0,
  "source": "maturity-service",
  "action_hint": "flag_high",
  "advisory": "High-confidence mature content detected"
}

action_hint values: safe, review, flag_high. Nova should use these to decide blur/queue/flag behaviour.

Vector DB (Qdrant) via gateway

Qdrant point IDs must be either:

an unsigned integer
a UUID string

If you send another string value, the wrapper may replace it with a generated UUID. In that case the original value is stored in the payload as _original_id.

You can fetch a stored point by its preserved original application ID:

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/points/by-original-id/img-001

Store image embedding by URL

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/upsert \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","id":"550e8400-e29b-41d4-a716-446655440000","metadata":{"category":"wallpaper"}}'

Store image embedding by file upload

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/upsert/file \
  -F "file=@/path/to/image.webp" \
  -F 'id=550e8400-e29b-41d4-a716-446655440001' \
  -F 'metadata_json={"category":"photo"}'

Search similar images by URL

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5,"filter_metadata":{"is_public":true}}'

Optional search parameters: hnsw_ef (int), exact (bool), indexed_only (bool), score_threshold (float), filter_metadata (object).

Search similar images by file upload

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search/file \
  -F "file=@/path/to/image.webp" \
  -F "limit=5" \
  -F 'filter_metadata_json={"is_public":true}'

List collections

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections

Get collection info

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images

Full diagnostic inspect

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/inspect

Returns HNSW config, optimizer config, quantization, segment count, payload index coverage, and RAM estimate for every collection.

Payload index management

# List indexes
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images/indexes

# Create a single index
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/indexes \
  -H "Content-Type: application/json" \
  -d '{"field":"is_public","type":"bool"}'

# Ensure multiple indexes (idempotent)
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/ensure-indexes \
  -H "Content-Type: application/json" \
  -d '{"fields":[{"field":"is_public","type":"bool"},{"field":"category_id","type":"integer"}]}'

Supported index types: keyword, integer, float, bool, geo, datetime, text, uuid.

Collection configuration (HNSW / optimizer / quantization)

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/configure \
  -H "Content-Type: application/json" \
  -d '{"hnsw_m":16,"hnsw_ef_construct":200,"indexing_threshold":20000,"quantization_type":"int8"}'

Delete points

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/delete \
  -H "Content-Type: application/json" \
  -d '{"ids":["550e8400-e29b-41d4-a716-446655440000","550e8400-e29b-41d4-a716-446655440001"]}'

If you let the wrapper generate a UUID, use the returned id value for later get, search, or delete operations.

Card Renderer

List available templates

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/cards/templates

Render a card from a URL

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title","username":"@artist","template":"nova-artwork-v1"}'

Returns binary image bytes (WebP by default).

Render a card from a file upload

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/file \
  -F "file=@/path/to/image.webp" \
  -F "title=Artwork Title" \
  -F "username=@artist" \
  -F "template=nova-artwork-v1"

Get card layout metadata (no image rendered)

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/meta \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title"}'

LLM / Chat Completions

The gateway exposes stable text-generation endpoints backed by the internal llm service. They reuse the existing X-API-Key protection and keep the LLM container internal-only.

OpenAI-style chat endpoint

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a concise assistant for Skinbase Nova."},
      {"role": "user", "content": "Write a short creator biography for an artist who just hit 10,000 followers."}
    ],
    "temperature": 0.7,
    "max_tokens": 220,
    "stream": false
  }'

Project-friendly chat endpoint

curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/ai/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a concise assistant for Skinbase Nova."},
      {"role": "user", "content": "Suggest metadata tags for a cyberpunk wallpaper pack."}
    ],
    "max_tokens": 180
  }'

List models

curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/v1/models
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/ai/models

Notes

Models are loaded at service startup; initial container start can take 1–2 minutes as model weights are downloaded.
Qdrant data is persisted in the project folder at ./data/qdrant, so it survives container restarts and recreates.
The local llm profile does not auto-download Qwen3 weights. Mount the GGUF file explicitly and let startup fail fast if it is missing.
Remote image URLs are restricted to public http/https hosts. Localhost, private IP ranges, and non-image content types are rejected.
The maturity service uses Falconsai/nsfw_image_detection (ViT-based). Thresholds are configurable via .env. The model handles photos and stylized digital art but should be calibrated against real Skinbase content before production use.
For small VPS deployments, prefer LLM_ENABLED=true with LLM_URL pointing to a separate LLM host instead of running the llm profile on the same machine.
For production: add auth, rate limits, and restrict gateway exposure (private network).
GPU: you can add NVIDIA runtime later (compose profiles) if needed.

README.md Unescape Escape

Skinbase Vision Stack (CLIP + BLIP + YOLO + Qdrant + Card Renderer + Maturity + LLM) – Dockerized FastAPI