docs: update README and USAGE for card-renderer, Qdrant optimization endpoints, and search params

fix: quantization in /configure, HNSW defaults in POST /collections, filter_metadata in search/file
fix(qdrant): complete optimization gaps from v1
2026-03-31 20:16:55 +02:00 · 2026-03-31 20:08:58 +02:00 · 2026-03-31 20:01:52 +02:00 · 2026-03-31 19:58:47 +02:00
5 changed files with 575 additions and 18 deletions
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
-# Skinbase Vision Stack (CLIP + BLIP + YOLO + Qdrant) – Dockerized FastAPI
+# Skinbase Vision Stack (CLIP + BLIP + YOLO + Qdrant + Card Renderer) – Dockerized FastAPI

-This repository provides **four standalone vision services** (CLIP / BLIP / YOLO / Qdrant)
+This repository provides **five standalone vision services** (CLIP / BLIP / YOLO / Qdrant / Card Renderer)
 and a **Gateway API** that can call them individually or together.

 ## Services & Ports
@@ -11,6 +11,7 @@ and a **Gateway API** that can call them individually or together.
 - `yolo`: internal only
 - `qdrant`: vector DB (port `6333` exposed for direct access)
 - `qdrant-svc`: internal Qdrant API wrapper
+- `card-renderer`: internal card rendering service

 ## Run

@@ -129,14 +130,17 @@ curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/up
 ```bash
 curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search \
  -H "Content-Type: application/json" \
-  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'
+  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5,"filter_metadata":{"is_public":true}}'
 ```

+Optional search parameters: `hnsw_ef` (int), `exact` (bool), `indexed_only` (bool), `score_threshold` (float), `filter_metadata` (object).
+
 ### Search similar images by file upload
 ```bash
 curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search/file \
  -F "file=@/path/to/image.webp" \
-  -F "limit=5"
+  -F "limit=5" \
+  -F 'filter_metadata_json={"is_public":true}'
 ```

 ### List collections
@@ -149,6 +153,38 @@ curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collection
 curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images
 ```

+### Full diagnostic inspect
+```bash
+curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/inspect
+```
+
+Returns HNSW config, optimizer config, quantization, segment count, payload index coverage, and RAM estimate for every collection.
+
+### Payload index management
+```bash
+# List indexes
+curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images/indexes
+
+# Create a single index
+curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/indexes \
+  -H "Content-Type: application/json" \
+  -d '{"field":"is_public","type":"bool"}'
+
+# Ensure multiple indexes (idempotent)
+curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/ensure-indexes \
+  -H "Content-Type: application/json" \
+  -d '{"fields":[{"field":"is_public","type":"bool"},{"field":"category_id","type":"integer"}]}'
+```
+
+Supported index types: `keyword`, `integer`, `float`, `bool`, `geo`, `datetime`, `text`, `uuid`.
+
+### Collection configuration (HNSW / optimizer / quantization)
+```bash
+curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/configure \
+  -H "Content-Type: application/json" \
+  -d '{"hnsw_m":16,"hnsw_ef_construct":200,"indexing_threshold":20000,"quantization_type":"int8"}'
+```
+
 ### Delete points
 ```bash
 curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/delete \
@@ -158,6 +194,38 @@ curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/de

 If you let the wrapper generate a UUID, use the returned `id` value for later `get`, `search`, or `delete` operations.

+## Card Renderer
+
+### List available templates
+```bash
+curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/cards/templates
+```
+
+### Render a card from a URL
+```bash
+curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render \
+  -H "Content-Type: application/json" \
+  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title","username":"@artist","template":"nova-artwork-v1"}'
+```
+
+Returns binary image bytes (WebP by default).
+
+### Render a card from a file upload
+```bash
+curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/file \
+  -F "file=@/path/to/image.webp" \
+  -F "title=Artwork Title" \
+  -F "username=@artist" \
+  -F "template=nova-artwork-v1"
+```
+
+### Get card layout metadata (no image rendered)
+```bash
+curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/meta \
+  -H "Content-Type: application/json" \
+  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title"}'
+```
+
 ## Notes

 - This is a **starter scaffold**. Models are loaded at service startup.
--- a/USAGE.md
+++ b/USAGE.md
@@ -4,7 +4,7 @@ This document explains how to run and use the Skinbase Vision Stack (Gateway + C

 ## Overview

- Services: `gateway`, `clip`, `blip`, `yolo`, `qdrant`, `qdrant-svc` (FastAPI each, except `qdrant` which is the official Qdrant DB).
+- Services: `gateway`, `clip`, `blip`, `yolo`, `qdrant`, `qdrant-svc`, `card-renderer` (FastAPI each, except `qdrant` which is the official Qdrant DB).
 - Gateway is the public API endpoint; the other services are internal.

 ## Model overview
@@ -17,6 +17,8 @@ This document explains how to run and use the Skinbase Vision Stack (Gateway + C

 - **Qdrant**: High-performance vector similarity search engine. Stores CLIP image embeddings and enables reverse image search (find similar images). The `qdrant-svc` wrapper auto-embeds images via CLIP before upserting.

+- **Card Renderer**: Generates branded social-card images (e.g. Open Graph previews) from artwork images. Applies smart center-weighted cropping, gradient overlays, title/username/tag text, and an optional logo. Returns binary image bytes (WebP by default). Template: `nova-artwork-v1`.
+
 ## Prerequisites

 - Docker Desktop (with `docker compose`) or a Docker environment.
@@ -219,8 +221,11 @@ Parameters:
 - `url` (required): query image URL.
 - `limit` (optional, default 5): number of results.
 - `score_threshold` (optional): minimum cosine similarity (0.0–1.0).
- `filter_metadata` (optional): filter results by metadata, e.g. `{"category":"wallpaper"}`.
+- `filter_metadata` (optional): filter results by payload fields, e.g. `{"is_public":true,"category_id":3}`.
 - `collection` (optional): collection to search.
+- `hnsw_ef` (optional, int): override the HNSW ef parameter at query time. Higher = better recall, slightly more latency.
+- `exact` (optional, bool, default false): brute-force exact search. Avoid on large collections.
+- `indexed_only` (optional, bool, default false): restrict search to fully indexed segments only. Useful during bulk ingest.

 Return: list of `{"id", "score", "metadata"}` sorted by similarity.

@@ -230,16 +235,19 @@ Return: list of `{"id", "score", "metadata"}` sorted by similarity.
 curl -X POST https://vision.klevze.net/vectors/search/file \
  -H "X-API-Key: <your-api-key>" \
  -F "file=@/path/to/image.webp" \
-  -F "limit=5"
+  -F "limit=5" \
+  -F 'filter_metadata_json={"is_public":true}'
 ```

+All URL search parameters are available as form fields; use `filter_metadata_json` (JSON string) for filters.
+
 #### Search by pre-computed vector

 ```bash
 curl -X POST https://vision.klevze.net/vectors/search/vector \
  -H "X-API-Key: <your-api-key>" \
  -H "Content-Type: application/json" \
-  -d '{"vector":[0.1,0.2,...],"limit":5}'
+  -d '{"vector":[0.1,0.2,...],"limit":5,"hnsw_ef":128}'
 ```

 #### Collection management
@@ -267,6 +275,67 @@ Delete a collection:
 curl -H "X-API-Key: <your-api-key>" -X DELETE https://vision.klevze.net/vectors/collections/my_collection
 ```

+#### Full diagnostic inspect
+
+Returns HNSW config, optimizer config, quantization, segment count, payload index coverage percentages, and RAM footprint estimate for every collection.
+
+```bash
+curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/inspect
+```
+
+#### Payload index management
+
+Payload indexes are critical for fast filtered vector search. Always create indexes for fields used in `filter_metadata` filters.
+
+```bash
+# List existing indexes
+curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images/indexes
+
+# Create a single index
+curl -X POST https://vision.klevze.net/vectors/collections/images/indexes \
+  -H "X-API-Key: <your-api-key>" \
+  -H "Content-Type: application/json" \
+  -d '{"field":"is_public","type":"bool"}'
+
+# Ensure multiple indexes exist (idempotent — safe to run multiple times)
+curl -X POST https://vision.klevze.net/vectors/collections/images/ensure-indexes \
+  -H "X-API-Key: <your-api-key>" \
+  -H "Content-Type: application/json" \
+  -d '{"fields":[{"field":"is_public","type":"bool"},{"field":"is_deleted","type":"bool"},{"field":"category_id","type":"integer"},{"field":"user_id","type":"keyword"}]}'
+```
+
+Supported index types: `keyword`, `integer`, `float`, `bool`, `geo`, `datetime`, `text`, `uuid`.
+
+#### Collection configuration (HNSW / optimizer / quantization)
+
+Updates HNSW, optimizer, or scalar quantization settings on an existing collection without data loss. HNSW graph and segment changes apply to newly created segments.
+
+```bash
+curl -X POST https://vision.klevze.net/vectors/collections/images/configure \
+  -H "X-API-Key: <your-api-key>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "hnsw_m": 16,
+    "hnsw_ef_construct": 200,
+    "hnsw_on_disk": false,
+    "indexing_threshold": 20000,
+    "default_segment_number": 4,
+    "quantization_type": "int8",
+    "quantization_quantile": 0.99,
+    "quantization_always_ram": true
+  }'
+```
+
+Parameters:
+- `hnsw_m` (int, 4–64): edges per node in the HNSW graph.
+- `hnsw_ef_construct` (int, 10–1000): ef during index construction.
+- `hnsw_on_disk` (bool): store HNSW graph on disk (saves RAM, slightly slower queries).
+- `indexing_threshold` (int): minimum vector changes before a segment is indexed.
+- `default_segment_number` (int, 1–32): target segment count for parallelism.
+- `quantization_type` (string, `"int8"` or null): enable scalar quantization (~4× RAM reduction).
+- `quantization_quantile` (float, 0.5–1.0, default 0.99): calibration quantile.
+- `quantization_always_ram` (bool, default true): keep quantized vectors in RAM.
+
 #### Delete points

 ```bash
@@ -290,6 +359,67 @@ If the wrapper had to replace your string `id` with a generated UUID, the origin
 curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/points/by-original-id/img-001
 ```

+## Card Renderer
+
+The card renderer generates branded social-card images from artwork photos. It applies smart center-weighted cropping, a gradient overlay, title/subtitle/username/category text, optional tags, and an optional logo.
+
+Default output: 1200×630 WebP (`nova-artwork-v1` template).
+
+### List available templates
+
+```bash
+curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/cards/templates
+```
+
+### Render a card from a URL
+
+```bash
+curl -X POST https://vision.klevze.net/cards/render \
+  -H "X-API-Key: <your-api-key>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "url": "https://files.skinbase.org/img/aa/bb/cc/md.webp",
+    "title": "Artwork Title",
+    "subtitle": "Optional subtitle",
+    "username": "@artist",
+    "category": "Digital Art",
+    "tags": ["surreal", "landscape"],
+    "template": "nova-artwork-v1",
+    "width": 1200,
+    "height": 630,
+    "output": "webp",
+    "quality": 90,
+    "show_logo": true
+  }'
+```
+
+Returns binary image bytes with `Content-Type: image/webp`.
+
+### Render a card from a file upload
+
+```bash
+curl -X POST https://vision.klevze.net/cards/render/file \
+  -H "X-API-Key: <your-api-key>" \
+  -F "file=@/path/to/image.webp" \
+  -F "title=Artwork Title" \
+  -F "username=@artist" \
+  -F "template=nova-artwork-v1" \
+  -F "show_logo=true"
+```
+
+Returns binary image bytes.
+
+### Get card layout metadata (no image rendered)
+
+```bash
+curl -X POST https://vision.klevze.net/cards/render/meta \
+  -H "X-API-Key: <your-api-key>" \
+  -H "Content-Type: application/json" \
+  -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title"}'
+```
+
+Returns crop coordinates and layout data without producing an image.
+
 ## Request/Response notes

 - For URL requests use `Content-Type: application/json`.
@@ -340,9 +470,5 @@ uvicorn main:app --host 0.0.0.0 --port 8000
 - `gateway/` — gateway FastAPI server.
 - `clip/`, `blip/`, `yolo/` — service implementations and Dockerfiles.
 - `qdrant/` — Qdrant API wrapper service (FastAPI).
+- `card-renderer/` — card rendering service (FastAPI).
 - `common/` — shared helpers (e.g., image I/O).
-
---
-
-If you want, I can merge these same contents into the project `README.md`,
-create a Postman collection, or add example response schemas for each endpoint.
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -77,6 +77,7 @@ services:
      - CLIP_URL=http://clip:8000
      - COLLECTION_NAME=images
      - VECTOR_DIM=512
+      - SEARCH_HNSW_EF=128
    depends_on:
      qdrant:
        condition: service_healthy
--- a/gateway/main.py
+++ b/gateway/main.py
@@ -243,13 +243,21 @@ async def vectors_search_file(
    limit: int = Form(5),
    score_threshold: Optional[float] = Form(None),
    collection: Optional[str] = Form(None),
+    hnsw_ef: Optional[int] = Form(None),
+    exact: bool = Form(False),
+    indexed_only: bool = Form(False),
+    filter_metadata_json: Optional[str] = Form(None),
 ):
    data = await file.read()
-    fields: Dict[str, Any] = {"limit": int(limit)}
+    fields: Dict[str, Any] = {"limit": int(limit), "exact": exact, "indexed_only": indexed_only}
    if score_threshold is not None:
        fields["score_threshold"] = float(score_threshold)
    if collection is not None:
        fields["collection"] = collection
+    if hnsw_ef is not None:
+        fields["hnsw_ef"] = int(hnsw_ef)
+    if filter_metadata_json is not None:
+        fields["filter_metadata_json"] = filter_metadata_json
    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
        return await _post_file(client, f"{QDRANT_SVC_URL}/search/file", data, fields)

@@ -284,6 +292,13 @@ async def vectors_collection_info(name: str):
        return await _get_json(client, f"{QDRANT_SVC_URL}/collections/{name}")


+@app.get("/vectors/inspect")
+async def vectors_inspect():
+    """Full diagnostic summary for all Qdrant collections (HNSW, optimizer, payload indexes, RAM estimate)."""
+    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
+        return await _get_json(client, f"{QDRANT_SVC_URL}/inspect")
+
+
@app.delete("/vectors/collections/{name}")
 async def vectors_delete_collection(name: str):
    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
@@ -416,3 +431,33 @@ async def cards_render_meta(payload: Dict[str, Any]):
    """Return crop and layout metadata for a card render (no image produced)."""
    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
        return await _post_json(client, f"{CARD_RENDERER_URL}/render/meta", payload)
+
+
+# ---- Qdrant administration endpoints (index management + collection config) ----
+
+@app.get("/vectors/collections/{name}/indexes")
+async def vectors_collection_indexes(name: str):
+    """List payload indexes for a collection."""
+    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
+        return await _get_json(client, f"{QDRANT_SVC_URL}/collections/{name}/indexes")
+
+
+@app.post("/vectors/collections/{name}/indexes")
+async def vectors_create_payload_index(name: str, payload: Dict[str, Any]):
+    """Create a payload index on a field in a collection."""
+    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
+        return await _post_json(client, f"{QDRANT_SVC_URL}/collections/{name}/indexes", payload)
+
+
+@app.post("/vectors/collections/{name}/ensure-indexes")
+async def vectors_ensure_indexes(name: str, payload: Dict[str, Any]):
+    """Idempotently ensure payload indexes exist for a list of fields."""
+    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
+        return await _post_json(client, f"{QDRANT_SVC_URL}/collections/{name}/ensure-indexes", payload)
+
+
+@app.post("/vectors/collections/{name}/configure")
+async def vectors_configure_collection(name: str, payload: Dict[str, Any]):
+    """Update HNSW and optimizer configuration for a collection."""
+    async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
+        return await _post_json(client, f"{QDRANT_SVC_URL}/collections/{name}/configure", payload)
--- a/qdrant/main.py
+++ b/qdrant/main.py
@@ -16,6 +16,12 @@ from qdrant_client.models import (
    Filter,
    FieldCondition,
    MatchValue,
+    HnswConfigDiff,
+    OptimizersConfigDiff,
+    SearchParams,
+    PayloadSchemaType,
+    ScalarQuantizationConfig,
+    ScalarType,
 )

 # ---------------------------------------------------------------------------
@@ -27,6 +33,8 @@ QDRANT_PORT = int(os.getenv("QDRANT_PORT", "6333"))
 CLIP_URL = os.getenv("CLIP_URL", "http://clip:8000")
 COLLECTION_NAME = os.getenv("COLLECTION_NAME", "images")
 VECTOR_DIM = int(os.getenv("VECTOR_DIM", "512"))
+# hnsw_ef at query time: higher = better recall, slightly more latency (Qdrant default ~100)
+SEARCH_HNSW_EF = int(os.getenv("SEARCH_HNSW_EF", "128"))

 app = FastAPI(title="Skinbase Qdrant Service", version="1.0.0")
 client: QdrantClient = None  # type: ignore[assignment]
@@ -44,12 +52,21 @@ def startup():


 def _ensure_collection():
-    """Create the default collection if it does not exist yet."""
+    """Create the default collection with production-friendly defaults if it does not exist yet."""
    collections = [c.name for c in client.get_collections().collections]
    if COLLECTION_NAME not in collections:
        client.create_collection(
            collection_name=COLLECTION_NAME,
            vectors_config=VectorParams(size=VECTOR_DIM, distance=Distance.COSINE),
+            hnsw_config=HnswConfigDiff(
+                m=16,
+                ef_construct=200,  # higher than default 100 = better index quality
+                on_disk=False,     # keep HNSW graph in RAM for fast traversal
+            ),
+            optimizers_config=OptimizersConfigDiff(
+                indexing_threshold=20000,    # start indexing after 20k accumulated vectors
+                default_segment_number=4,    # parallelism-friendly segment count
+            ),
        )


@@ -77,6 +94,9 @@ class SearchUrlRequest(BaseModel):
    score_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0)
    collection: Optional[str] = None
    filter_metadata: Dict[str, Any] = Field(default_factory=dict)
+    hnsw_ef: Optional[int] = Field(default=None, ge=1, le=512, description="Override ef at query time. Higher = better recall, slightly higher latency.")
+    exact: bool = Field(default=False, description="Brute-force exact search. Avoid on large collections.")
+    indexed_only: bool = Field(default=False, description="Search only fully indexed segments. Useful during bulk ingest.")


 class SearchVectorRequest(BaseModel):
@@ -85,6 +105,9 @@ class SearchVectorRequest(BaseModel):
    score_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0)
    collection: Optional[str] = None
    filter_metadata: Dict[str, Any] = Field(default_factory=dict)
+    hnsw_ef: Optional[int] = Field(default=None, ge=1, le=512)
+    exact: bool = False
+    indexed_only: bool = False


 class DeleteRequest(BaseModel):
@@ -189,6 +212,79 @@ def health():
        return {"status": "error", "detail": str(e)}


+@app.get("/inspect")
+def inspect():
+    """Return a full diagnostic summary for every collection.
+
+    Covers: vector counts, segment counts, HNSW config, optimizer config,
+    quantization, payload indexes and their coverage. Designed for production
+    health checks and the Qdrant optimization workflow.
+    """
+    try:
+        all_collections = client.get_collections().collections
+    except Exception as exc:
+        return {"status": "error", "detail": str(exc)}
+
+    result = {}
+    for col_desc in all_collections:
+        name = col_desc.name
+        try:
+            info = client.get_collection(name)
+            cfg = info.config
+            hnsw = cfg.hnsw_config
+            opt = cfg.optimizer_config
+            quant = cfg.quantization_config
+            params = cfg.params
+
+            # Estimate raw RAM footprint: vectors * dim * 4 bytes * 1.5 safety factor
+            vec_count = info.vectors_count or 0
+            vec_dim = (
+                params.vectors.size
+                if hasattr(params.vectors, "size")
+                else VECTOR_DIM
+            )
+            ram_estimate_mb = round(vec_count * vec_dim * 4 * 1.5 / 1_048_576, 1)
+
+            result[name] = {
+                "status": info.status.value if info.status else None,
+                "optimizer_status": str(info.optimizer_status) if info.optimizer_status else None,
+                "vectors_count": vec_count,
+                "indexed_vectors_count": info.indexed_vectors_count,
+                "points_count": info.points_count,
+                "segments_count": info.segments_count,
+                "ram_estimate_mb": ram_estimate_mb,
+                "hnsw": {
+                    "m": hnsw.m,
+                    "ef_construct": hnsw.ef_construct,
+                    "on_disk": hnsw.on_disk,
+                    "full_scan_threshold": hnsw.full_scan_threshold,
+                    "max_indexing_threads": hnsw.max_indexing_threads,
+                } if hnsw else None,
+                "optimizer": {
+                    "indexing_threshold": opt.indexing_threshold,
+                    "default_segment_number": opt.default_segment_number,
+                    "max_segment_size": opt.max_segment_size,
+                    "memmap_threshold": opt.memmap_threshold,
+                    "flush_interval_sec": opt.flush_interval_sec,
+                } if opt else None,
+                "quantization": str(quant) if quant else None,
+                "payload_indexes": {
+                    k: {
+                        "type": v.data_type.value if hasattr(v.data_type, "value") else str(v.data_type),
+                        "points": v.points,
+                        "coverage_pct": round(v.points / max(vec_count, 1) * 100, 1),
+                    }
+                    for k, v in (info.payload_schema or {}).items()
+                },
+                "payload_index_count": len(info.payload_schema or {}),
+                "search_hnsw_ef": SEARCH_HNSW_EF,
+            }
+        except Exception as exc:
+            result[name] = {"error": str(exc)}
+
+    return {"collections": result, "total": len(result)}
+
+
 # ---------------------------------------------------------------------------
 # Collection management
 # ---------------------------------------------------------------------------
@@ -204,9 +300,13 @@ def create_collection(req: CollectionRequest):
    if req.name in collections:
        raise HTTPException(409, f"Collection '{req.name}' already exists")

+    # Apply the same production defaults as _ensure_collection so all
+    # collections start with tuned HNSW and optimizer settings.
    client.create_collection(
        collection_name=req.name,
        vectors_config=VectorParams(size=req.vector_dim, distance=dist),
+        hnsw_config=HnswConfigDiff(m=16, ef_construct=200, on_disk=False),
+        optimizers_config=OptimizersConfigDiff(indexing_threshold=20000, default_segment_number=4),
    )
    return {"created": req.name, "vector_dim": req.vector_dim, "distance": req.distance}

@@ -221,11 +321,40 @@ def list_collections():
 def collection_info(name: str):
    try:
        info = client.get_collection(name)
+        cfg = info.config
+        hnsw = cfg.hnsw_config
+        opt = cfg.optimizer_config
+        quant = cfg.quantization_config
        return {
            "name": name,
            "vectors_count": info.vectors_count,
+            "indexed_vectors_count": info.indexed_vectors_count,
            "points_count": info.points_count,
+            "segments_count": info.segments_count,
            "status": info.status.value if info.status else None,
+            "optimizer_status": str(info.optimizer_status) if info.optimizer_status else None,
+            "hnsw": {
+                "m": hnsw.m,
+                "ef_construct": hnsw.ef_construct,
+                "on_disk": hnsw.on_disk,
+                "full_scan_threshold": hnsw.full_scan_threshold,
+                "max_indexing_threads": hnsw.max_indexing_threads,
+            } if hnsw else None,
+            "optimizer": {
+                "indexing_threshold": opt.indexing_threshold,
+                "default_segment_number": opt.default_segment_number,
+                "max_segment_size": opt.max_segment_size,
+                "memmap_threshold": opt.memmap_threshold,
+                "flush_interval_sec": opt.flush_interval_sec,
+            } if opt else None,
+            "quantization": str(quant) if quant else None,
+            "payload_schema": {
+                k: {
+                    "type": v.data_type.value if hasattr(v.data_type, "value") else str(v.data_type),
+                    "points": v.points,
+                }
+                for k, v in (info.payload_schema or {}).items()
+            },
        }
    except Exception as e:
        raise HTTPException(404, str(e))
@@ -325,7 +454,7 @@ def upsert_vector(req: UpsertVectorRequest):
 async def search_url(req: SearchUrlRequest):
    """Embed an image by URL via CLIP, then search Qdrant for similar vectors."""
    vector = await _embed_url(req.url)
-    return _do_search(vector, req.limit, req.score_threshold, req.collection, req.filter_metadata)
+    return _do_search(vector, req.limit, req.score_threshold, req.collection, req.filter_metadata, req.hnsw_ef, req.exact, req.indexed_only)


@app.post("/search/file")
@@ -334,17 +463,28 @@ async def search_file(
    limit: int = Form(5),
    score_threshold: Optional[float] = Form(None),
    collection: Optional[str] = Form(None),
+    hnsw_ef: Optional[int] = Form(None),
+    exact: bool = Form(False),
+    indexed_only: bool = Form(False),
+    filter_metadata_json: Optional[str] = Form(None),
 ):
    """Embed an uploaded image via CLIP, then search Qdrant for similar vectors."""
+    import json
+    filter_metadata: Dict[str, Any] = {}
+    if filter_metadata_json:
+        try:
+            filter_metadata = json.loads(filter_metadata_json)
+        except json.JSONDecodeError:
+            raise HTTPException(400, "filter_metadata_json must be valid JSON")
    data = await file.read()
    vector = await _embed_bytes(data)
-    return _do_search(vector, int(limit), score_threshold, collection, {})
+    return _do_search(vector, int(limit), score_threshold, collection, filter_metadata, hnsw_ef, exact, indexed_only)


@app.post("/search/vector")
 def search_vector(req: SearchVectorRequest):
    """Search Qdrant using a pre-computed vector."""
-    return _do_search(req.vector, req.limit, req.score_threshold, req.collection, req.filter_metadata)
+    return _do_search(req.vector, req.limit, req.score_threshold, req.collection, req.filter_metadata, req.hnsw_ef, req.exact, req.indexed_only)


 def _do_search(
@@ -353,9 +493,13 @@ def _do_search(
    score_threshold: Optional[float],
    collection: Optional[str],
    filter_metadata: Dict[str, Any],
+    hnsw_ef: Optional[int] = None,
+    exact: bool = False,
+    indexed_only: bool = False,
 ):
    col = _col(collection)
    qfilter = _build_filter(filter_metadata)
+    ef = hnsw_ef if hnsw_ef is not None else SEARCH_HNSW_EF

    results = client.query_points(
        collection_name=col,
@@ -363,6 +507,7 @@ def _do_search(
        limit=limit,
        score_threshold=score_threshold,
        query_filter=qfilter,
+        search_params=SearchParams(hnsw_ef=ef, exact=exact, indexed_only=indexed_only),
    )

    hits = []
@@ -438,3 +583,175 @@ def get_point_by_original_id(original_id: str, collection: Optional[str] = None)
        raise
    except Exception as e:
        raise HTTPException(404, str(e))
+
+
+# ---------------------------------------------------------------------------
+# Payload index management
+# ---------------------------------------------------------------------------
+
+_SCHEMA_TYPE_MAP: Dict[str, PayloadSchemaType] = {
+    t.value: t for t in PayloadSchemaType
+}
+
+
+def _resolve_schema_type(type_str: str) -> PayloadSchemaType:
+    schema = _SCHEMA_TYPE_MAP.get(type_str.lower())
+    if schema is None:
+        raise HTTPException(400, f"Unknown index type '{type_str}'. Valid: {', '.join(_SCHEMA_TYPE_MAP)}")
+    return schema
+
+
+class PayloadIndexRequest(BaseModel):
+    field: str
+    type: str = Field(default="keyword", description="keyword | integer | float | bool | geo | datetime | text | uuid")
+    collection: Optional[str] = None
+
+
+class EnsureIndexesRequest(BaseModel):
+    """List of field specs, each with 'field' and optional 'type' keys."""
+    fields: List[Dict[str, str]]
+    collection: Optional[str] = None
+
+
+@app.get("/collections/{name}/indexes")
+def collection_indexes(name: str):
+    """List all payload indexes for a collection."""
+    try:
+        info = client.get_collection(name)
+        schema = info.payload_schema or {}
+        return {
+            "collection": name,
+            "indexes": {
+                k: {
+                    "type": v.data_type.value if hasattr(v.data_type, "value") else str(v.data_type),
+                    "points": v.points,
+                }
+                for k, v in schema.items()
+            },
+            "count": len(schema),
+        }
+    except Exception as e:
+        raise HTTPException(404, str(e))
+
+
+@app.post("/collections/{name}/indexes")
+def create_index(name: str, req: PayloadIndexRequest):
+    """Create a payload index on a single field."""
+    col = req.collection or name
+    schema = _resolve_schema_type(req.type)
+    try:
+        client.create_payload_index(
+            collection_name=col,
+            field_name=req.field,
+            field_schema=schema,
+        )
+        return {"collection": col, "field": req.field, "type": req.type, "status": "created"}
+    except Exception as e:
+        raise HTTPException(500, str(e))
+
+
+@app.post("/collections/{name}/ensure-indexes")
+def ensure_indexes(name: str, req: EnsureIndexesRequest):
+    """Idempotently ensure payload indexes exist for a list of fields.
+
+    Skips fields that are already indexed; only creates the missing ones.
+    Example body: {"fields": [{"field": "is_public", "type": "bool"}, {"field": "category_id", "type": "integer"}]}
+    """
+    col = req.collection or name
+    try:
+        info = client.get_collection(col)
+    except Exception as e:
+        raise HTTPException(404, str(e))
+
+    existing = set(info.payload_schema.keys()) if info.payload_schema else set()
+    created: List[str] = []
+    skipped: List[str] = []
+
+    for field_spec in req.fields:
+        field = field_spec.get("field")
+        type_str = field_spec.get("type", "keyword")
+        if not field:
+            raise HTTPException(400, "Each field spec must include a 'field' key")
+        if field in existing:
+            skipped.append(field)
+            continue
+        schema = _resolve_schema_type(type_str)
+        try:
+            client.create_payload_index(
+                collection_name=col,
+                field_name=field,
+                field_schema=schema,
+            )
+            created.append(field)
+        except Exception as exc:
+            raise HTTPException(500, f"Failed to index '{field}': {exc}")
+
+    return {"collection": col, "created": created, "skipped": skipped}
+
+
+# ---------------------------------------------------------------------------
+# Collection HNSW + optimizer configuration
+# ---------------------------------------------------------------------------
+
+class CollectionConfigRequest(BaseModel):
+    hnsw_m: Optional[int] = Field(default=None, ge=4, le=64, description="Edges per node in the HNSW graph.")
+    hnsw_ef_construct: Optional[int] = Field(default=None, ge=10, le=1000, description="ef during index construction. Changes apply to new segments only.")
+    hnsw_on_disk: Optional[bool] = Field(default=None, description="Store HNSW graph on disk (saves RAM, slightly slower queries).")
+    indexing_threshold: Optional[int] = Field(default=None, ge=0, description="Min payload changes before a segment is indexed.")
+    default_segment_number: Optional[int] = Field(default=None, ge=1, le=32, description="Target number of segments for parallelism.")
+    # Scalar quantization — reduces RAM ~4x, often speeds up search on large collections.
+    # Set quantization_type='int8' to enable. Use always_ram=True to keep quantized
+    # vectors in RAM (recommended on VPS with limited memory but fast disk).
+    quantization_type: Optional[str] = Field(default=None, description="Enable scalar quantization: 'int8'. Set to null to keep current setting.")
+    quantization_quantile: float = Field(default=0.99, ge=0.5, le=1.0, description="Fraction of vectors used to calibrate quantization range (0.99 recommended).")
+    quantization_always_ram: bool = Field(default=True, description="Keep quantized vectors in RAM even when raw vectors are on disk.")
+
+
+@app.post("/collections/{name}/configure")
+def configure_collection(name: str, req: CollectionConfigRequest):
+    """Apply HNSW and optimizer configuration updates to an existing collection.
+
+    Changes are applied in-place without data loss or re-ingestion.
+    Note: hnsw_m and hnsw_ef_construct only affect newly created segments.
+    """
+    hnsw_kwargs = {k: v for k, v in {
+        "m": req.hnsw_m,
+        "ef_construct": req.hnsw_ef_construct,
+        "on_disk": req.hnsw_on_disk,
+    }.items() if v is not None}
+
+    opt_kwargs = {k: v for k, v in {
+        "indexing_threshold": req.indexing_threshold,
+        "default_segment_number": req.default_segment_number,
+    }.items() if v is not None}
+
+    # Build optional scalar quantization config
+    quant_config = None
+    if req.quantization_type is not None:
+        if req.quantization_type.lower() != "int8":
+            raise HTTPException(400, f"Unsupported quantization_type '{req.quantization_type}'. Only 'int8' is supported.")
+        quant_config = ScalarQuantizationConfig(
+            type=ScalarType.INT8,
+            quantile=req.quantization_quantile,
+            always_ram=req.quantization_always_ram,
+        )
+
+    if not hnsw_kwargs and not opt_kwargs and quant_config is None:
+        raise HTTPException(400, "No configuration fields provided")
+
+    try:
+        client.update_collection(
+            collection_name=name,
+            hnsw_config=HnswConfigDiff(**hnsw_kwargs) if hnsw_kwargs else None,
+            optimizers_config=OptimizersConfigDiff(**opt_kwargs) if opt_kwargs else None,
+            quantization_config=quant_config,
+        )
+        return {
+            "collection": name,
+            "status": "updated",
+            "hnsw_changes": hnsw_kwargs,
+            "optimizer_changes": opt_kwargs,
+            "quantization": {"type": req.quantization_type, "quantile": req.quantization_quantile, "always_ram": req.quantization_always_ram} if quant_config else None,
+        }
+    except Exception as exc:
+        raise HTTPException(500, str(exc))
Author	SHA1	Message	Date
Gregor Klevze	3f925e17d5	docs: update README and USAGE for card-renderer, Qdrant optimization endpoints, and search params	2026-03-31 20:16:55 +02:00
Gregor Klevze	6ea91c3452	fix: quantization in /configure, HNSW defaults in POST /collections, filter_metadata in search/file	2026-03-31 20:08:58 +02:00
Gregor Klevze	609485a0f0	fix(qdrant): complete optimization gaps from v1 - qdrant/main.py: search/file now accepts hnsw_ef, exact, indexed_only form fields (was silently ignoring them, using server defaults only) - qdrant/main.py: add GET /inspect endpoint — full diagnostic summary for all collections: HNSW, optimizer, quantization, segment count, payload index coverage, raw RAM estimate (vectors * dim * 4B * 1.5) - gateway/main.py: vectors/search/file now forwards hnsw_ef, exact, indexed_only - gateway/main.py: add GET /vectors/inspect proxy	2026-03-31 20:01:52 +02:00
Gregor Klevze	c7ea347e2b	feat(qdrant): optimization — payload indexes, HNSW tuning, search params (v1) Inspection findings: - _ensure_collection() created collections with bare VectorParams (no HNSW/optimizer config) - _do_search() had no SearchParams — used Qdrant defaults (ef often ~100, no indexed_only) - No payload index management at all — filtered searches scanned unindexed fields every time - collection_info() returned minimal data — impossible to inspect production state - No way to create/ensure payload indexes via the API Changes — qdrant/main.py: - Add SEARCH_HNSW_EF env var (default 128, above Qdrant default for better recall) - _ensure_collection(): configure HnswConfigDiff(m=16, ef_construct=200, on_disk=False) and OptimizersConfigDiff(indexing_threshold=20000, default_segment_number=4) on creation - _do_search(): use SearchParams(hnsw_ef, exact, indexed_only) on every query - SearchUrlRequest + SearchVectorRequest: expose hnsw_ef, exact, indexed_only per request - collection_info(): expand to full HNSW/optimizer/quantization/segment/payload_schema detail - GET /collections/{name}/indexes — list all payload indexes - POST /collections/{name}/indexes — create a single payload index - POST /collections/{name}/ensure-indexes — idempotent bulk index creation (skip existing) - POST /collections/{name}/configure — apply HNSW/optimizer changes to existing collections Changes — gateway/main.py: - Expose the 4 new qdrant-svc endpoints under /vectors/collections/{name}/... Changes — docker-compose.yml: - Add SEARCH_HNSW_EF=128 to qdrant-svc environment Critical usage note for existing collections: After deploying, call POST /vectors/collections/images/ensure-indexes with the payload fields actually used in filter_metadata (is_public, category_id, etc.) to add missing indexes. This is the highest-impact single action for filtered search.	2026-03-31 19:58:47 +02:00