8.2 KiB
Skinbase Vision Stack — Usage Guide
This document explains how to run and use the Skinbase Vision Stack (Gateway + CLIP, BLIP, YOLO, Qdrant services).
Overview
- Services:
gateway,clip,blip,yolo,qdrant,qdrant-svc(FastAPI each, exceptqdrantwhich is the official Qdrant DB). - Gateway is the public API endpoint; the other services are internal.
Model overview
-
CLIP: Contrastive Language–Image Pretraining — maps images and text into a shared embedding space. Used for zero-shot image tagging, similarity search, and returning ranked tags with confidence scores.
-
BLIP: Bootstrapping Language-Image Pre-training — a vision–language model for image captioning and multimodal generation. BLIP produces human-readable captions (multiple
variantssupported) and can be tuned withmax_length. -
YOLO: You Only Look Once — a family of real-time object-detection models. YOLO returns detected objects with
class,confidence, andbbox(bounding box coordinates); useconfto filter low-confidence detections. -
Qdrant: High-performance vector similarity search engine. Stores CLIP image embeddings and enables reverse image search (find similar images). The
qdrant-svcwrapper auto-embeds images via CLIP before upserting.
Prerequisites
- Docker Desktop (with
docker compose) or a Docker environment. - Recommended: at least 8GB RAM for CPU-only; more for model memory or GPU use.
Start the stack
Run from repository root:
docker compose up -d --build
Stop:
docker compose down
View logs:
docker compose logs -f
docker compose logs -f gateway
Health
Check the gateway health endpoint:
curl https://vision.klevze.net/health
Universal analyze (ALL)
Analyze an image by URL (gateway aggregates CLIP, BLIP, YOLO):
curl -X POST https://vision.klevze.net/analyze/all \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'
File upload (multipart):
curl -X POST https://vision.klevze.net/analyze/all/file \
-F "file=@/path/to/image.webp" \
-F "limit=5"
Parameters:
limit: optional integer to limit returned tag/caption items.
Individual services (via gateway)
These endpoints call the specific service through the gateway.
CLIP — tags
URL request:
curl -X POST https://vision.klevze.net/analyze/clip \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'
File upload:
curl -X POST https://vision.klevze.net/analyze/clip/file \
-F "file=@/path/to/image.webp" \
-F "limit=5"
Return: JSON list of tags with confidence scores.
BLIP — captioning
URL request:
curl -X POST https://vision.klevze.net/analyze/blip \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","variants":3}'
File upload:
curl -X POST https://vision.klevze.net/analyze/blip/file \
-F "file=@/path/to/image.webp" \
-F "variants=3" \
-F "max_length=60"
Parameters:
variants: number of caption variants to return.max_length: optional maximum caption length.
Return: one or more caption strings (optionally with scores).
YOLO — object detection
URL request:
curl -X POST https://vision.klevze.net/analyze/yolo \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","conf":0.25}'
File upload:
curl -X POST https://vision.klevze.net/analyze/yolo/file \
-F "file=@/path/to/image.webp" \
-F "conf=0.25"
Parameters:
conf: confidence threshold (0.0–1.0).
Return: detected objects with class, confidence, and bbox (bounding box coordinates).
Qdrant — vector storage & similarity search
The Qdrant integration lets you store image embeddings and find visually similar images. Embeddings are generated automatically by the CLIP service.
Upsert (store) an image by URL
curl -X POST https://vision.klevze.net/vectors/upsert \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","id":"img-001","metadata":{"category":"wallpaper","source":"upload"}}'
Parameters:
url(required): image URL to embed and store.id(optional): custom string ID for the point; auto-generated if omitted.metadata(optional): arbitrary key-value payload stored alongside the vector.collection(optional): target collection name (defaults toimages).
Upsert by file upload
curl -X POST https://vision.klevze.net/vectors/upsert/file \
-F "file=@/path/to/image.webp" \
-F 'id=img-002' \
-F 'metadata_json={"category":"photo"}'
Upsert a pre-computed vector
curl -X POST https://vision.klevze.net/vectors/upsert/vector \
-H "Content-Type: application/json" \
-d '{"vector":[0.1,0.2,...],"id":"img-003","metadata":{"custom":"data"}}'
Search similar images by URL
curl -X POST https://vision.klevze.net/vectors/search \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'
Parameters:
url(required): query image URL.limit(optional, default 5): number of results.score_threshold(optional): minimum cosine similarity (0.0–1.0).filter_metadata(optional): filter results by metadata, e.g.{"category":"wallpaper"}.collection(optional): collection to search.
Return: list of {"id", "score", "metadata"} sorted by similarity.
Search by file upload
curl -X POST https://vision.klevze.net/vectors/search/file \
-F "file=@/path/to/image.webp" \
-F "limit=5"
Search by pre-computed vector
curl -X POST https://vision.klevze.net/vectors/search/vector \
-H "Content-Type: application/json" \
-d '{"vector":[0.1,0.2,...],"limit":5}'
Collection management
List all collections:
curl https://vision.klevze.net/vectors/collections
Get collection info:
curl https://vision.klevze.net/vectors/collections/images
Create a custom collection:
curl -X POST https://vision.klevze.net/vectors/collections \
-H "Content-Type: application/json" \
-d '{"name":"my_collection","vector_dim":512,"distance":"cosine"}'
Delete a collection:
curl -X DELETE https://vision.klevze.net/vectors/collections/my_collection
Delete points
curl -X POST https://vision.klevze.net/vectors/delete \
-H "Content-Type: application/json" \
-d '{"ids":["img-001","img-002"]}'
Get a point by ID
curl https://vision.klevze.net/vectors/points/img-001
Request/Response notes
- For URL requests use
Content-Type: application/json. - For uploads use
multipart/form-datawith afilefield. - The gateway aggregates and normalizes outputs for
/analyze/all.
Running a single service
To run only one service via docker compose:
docker compose up -d --build clip
Or run locally (Python env) from the service folder:
# inside clip/ or blip/ or yolo/
uvicorn main:app --host 0.0.0.0 --port 8000
Production tips
- Add authentication (API keys or OAuth) at the gateway.
- Add rate-limiting and per-client quotas.
- Keep model services on an internal Docker network.
- For GPU: enable NVIDIA runtime and update service Dockerfiles / compose profiles.
Troubleshooting
- Service fails to start: check
docker compose logs <service>for model load errors. - High memory / OOM: increase host memory or reduce model footprint; consider GPUs.
- Slow startup: model weights load on service startup — expect extra time.
Extending
- Swap or update models in each service by editing that service's
main.py. - Add request validation, timeouts, and retries in the gateway to improve robustness.
Files of interest
docker-compose.yml— composition and service definitions.gateway/— gateway FastAPI server.clip/,blip/,yolo/— service implementations and Dockerfiles.qdrant/— Qdrant API wrapper service (FastAPI).common/— shared helpers (e.g., image I/O).
If you want, I can merge these same contents into the project README.md,
create a Postman collection, or add example response schemas for each endpoint.