klevze/vision

Fork 0

Files

Gregor Klevze 8da669c0e1 first commit

2026-03-21 09:09:28 +01:00

8.2 KiB

Raw Blame History

Skinbase Vision Stack — Usage Guide

This document explains how to run and use the Skinbase Vision Stack (Gateway + CLIP, BLIP, YOLO, Qdrant services).

Overview

Services: gateway, clip, blip, yolo, qdrant, qdrant-svc (FastAPI each, except qdrant which is the official Qdrant DB).
Gateway is the public API endpoint; the other services are internal.

Model overview

CLIP: Contrastive Language–Image Pretraining — maps images and text into a shared embedding space. Used for zero-shot image tagging, similarity search, and returning ranked tags with confidence scores.
BLIP: Bootstrapping Language-Image Pre-training — a vision–language model for image captioning and multimodal generation. BLIP produces human-readable captions (multiple variants supported) and can be tuned with max_length.
YOLO: You Only Look Once — a family of real-time object-detection models. YOLO returns detected objects with class, confidence, and bbox (bounding box coordinates); use conf to filter low-confidence detections.
Qdrant: High-performance vector similarity search engine. Stores CLIP image embeddings and enables reverse image search (find similar images). The qdrant-svc wrapper auto-embeds images via CLIP before upserting.

Prerequisites

Docker Desktop (with docker compose) or a Docker environment.
Recommended: at least 8GB RAM for CPU-only; more for model memory or GPU use.

Start the stack

Run from repository root:

docker compose up -d --build

Stop:

docker compose down

View logs:

docker compose logs -f
docker compose logs -f gateway

Health

Check the gateway health endpoint:

curl https://vision.klevze.net/health

Universal analyze (ALL)

Analyze an image by URL (gateway aggregates CLIP, BLIP, YOLO):

curl -X POST https://vision.klevze.net/analyze/all \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'

File upload (multipart):

curl -X POST https://vision.klevze.net/analyze/all/file \
  -F "file=@/path/to/image.webp" \
  -F "limit=5"

Parameters:

limit: optional integer to limit returned tag/caption items.

Individual services (via gateway)

These endpoints call the specific service through the gateway.

CLIP — tags

URL request:

curl -X POST https://vision.klevze.net/analyze/clip \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'

File upload:

curl -X POST https://vision.klevze.net/analyze/clip/file \
  -F "file=@/path/to/image.webp" \
  -F "limit=5"

Return: JSON list of tags with confidence scores.

BLIP — captioning

URL request:

curl -X POST https://vision.klevze.net/analyze/blip \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","variants":3}'

File upload:

curl -X POST https://vision.klevze.net/analyze/blip/file \
  -F "file=@/path/to/image.webp" \
  -F "variants=3" \
  -F "max_length=60"

Parameters:

variants: number of caption variants to return.
max_length: optional maximum caption length.

Return: one or more caption strings (optionally with scores).

YOLO — object detection

URL request:

curl -X POST https://vision.klevze.net/analyze/yolo \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","conf":0.25}'

File upload:

curl -X POST https://vision.klevze.net/analyze/yolo/file \
  -F "file=@/path/to/image.webp" \
  -F "conf=0.25"

Parameters:

conf: confidence threshold (0.0–1.0).

Return: detected objects with class, confidence, and bbox (bounding box coordinates).

Qdrant — vector storage & similarity search

The Qdrant integration lets you store image embeddings and find visually similar images. Embeddings are generated automatically by the CLIP service.

Upsert (store) an image by URL

curl -X POST https://vision.klevze.net/vectors/upsert \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","id":"img-001","metadata":{"category":"wallpaper","source":"upload"}}'

Parameters:

url (required): image URL to embed and store.
id (optional): custom string ID for the point; auto-generated if omitted.
metadata (optional): arbitrary key-value payload stored alongside the vector.
collection (optional): target collection name (defaults to images).

Upsert by file upload

curl -X POST https://vision.klevze.net/vectors/upsert/file \
  -F "file=@/path/to/image.webp" \
  -F 'id=img-002' \
  -F 'metadata_json={"category":"photo"}'

Upsert a pre-computed vector

curl -X POST https://vision.klevze.net/vectors/upsert/vector \
  -H "Content-Type: application/json" \
  -d '{"vector":[0.1,0.2,...],"id":"img-003","metadata":{"custom":"data"}}'

Search similar images by URL

curl -X POST https://vision.klevze.net/vectors/search \
  -H "Content-Type: application/json" \
  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'

Parameters:

url (required): query image URL.
limit (optional, default 5): number of results.
score_threshold (optional): minimum cosine similarity (0.0–1.0).
filter_metadata (optional): filter results by metadata, e.g. {"category":"wallpaper"}.
collection (optional): collection to search.

Return: list of {"id", "score", "metadata"} sorted by similarity.

Search by file upload

curl -X POST https://vision.klevze.net/vectors/search/file \
  -F "file=@/path/to/image.webp" \
  -F "limit=5"

Search by pre-computed vector

curl -X POST https://vision.klevze.net/vectors/search/vector \
  -H "Content-Type: application/json" \
  -d '{"vector":[0.1,0.2,...],"limit":5}'

Collection management

List all collections:

curl https://vision.klevze.net/vectors/collections

Get collection info:

curl https://vision.klevze.net/vectors/collections/images

Create a custom collection:

curl -X POST https://vision.klevze.net/vectors/collections \
  -H "Content-Type: application/json" \
  -d '{"name":"my_collection","vector_dim":512,"distance":"cosine"}'

Delete a collection:

curl -X DELETE https://vision.klevze.net/vectors/collections/my_collection

Delete points

curl -X POST https://vision.klevze.net/vectors/delete \
  -H "Content-Type: application/json" \
  -d '{"ids":["img-001","img-002"]}'

Get a point by ID

curl https://vision.klevze.net/vectors/points/img-001

Request/Response notes

For URL requests use Content-Type: application/json.
For uploads use multipart/form-data with a file field.
The gateway aggregates and normalizes outputs for /analyze/all.

Running a single service

To run only one service via docker compose:

docker compose up -d --build clip

Or run locally (Python env) from the service folder:

# inside clip/ or blip/ or yolo/
uvicorn main:app --host 0.0.0.0 --port 8000

Production tips

Add authentication (API keys or OAuth) at the gateway.
Add rate-limiting and per-client quotas.
Keep model services on an internal Docker network.
For GPU: enable NVIDIA runtime and update service Dockerfiles / compose profiles.

Troubleshooting

Service fails to start: check docker compose logs <service> for model load errors.
High memory / OOM: increase host memory or reduce model footprint; consider GPUs.
Slow startup: model weights load on service startup — expect extra time.

Extending

Swap or update models in each service by editing that service's main.py.
Add request validation, timeouts, and retries in the gateway to improve robustness.

Files of interest

docker-compose.yml — composition and service definitions.
gateway/ — gateway FastAPI server.
clip/, blip/, yolo/ — service implementations and Dockerfiles.
qdrant/ — Qdrant API wrapper service (FastAPI).
common/ — shared helpers (e.g., image I/O).

If you want, I can merge these same contents into the project README.md, create a Postman collection, or add example response schemas for each endpoint.

8.2 KiB Raw Blame History Unescape Escape

Skinbase Vision Stack — Usage Guide

Overview

Model overview

Prerequisites

Start the stack

Health

Universal analyze (ALL)

Individual services (via gateway)

CLIP — tags

BLIP — captioning

YOLO — object detection

Qdrant — vector storage & similarity search

Upsert (store) an image by URL

Upsert by file upload

Upsert a pre-computed vector

Search similar images by URL

Search by file upload

Search by pre-computed vector

Collection management

Delete points

Get a point by ID

Request/Response notes

Running a single service

Production tips

Troubleshooting

Extending

Files of interest

8.2 KiB

Raw Blame History