Compare commits

..

4 Commits

Author SHA1 Message Date
3f925e17d5 docs: update README and USAGE for card-renderer, Qdrant optimization endpoints, and search params 2026-03-31 20:16:55 +02:00
6ea91c3452 fix: quantization in /configure, HNSW defaults in POST /collections, filter_metadata in search/file 2026-03-31 20:08:58 +02:00
609485a0f0 fix(qdrant): complete optimization gaps from v1
- qdrant/main.py: search/file now accepts hnsw_ef, exact, indexed_only form fields
  (was silently ignoring them, using server defaults only)
- qdrant/main.py: add GET /inspect endpoint — full diagnostic summary for all
  collections: HNSW, optimizer, quantization, segment count, payload index coverage,
  raw RAM estimate (vectors * dim * 4B * 1.5)
- gateway/main.py: vectors/search/file now forwards hnsw_ef, exact, indexed_only
- gateway/main.py: add GET /vectors/inspect proxy
2026-03-31 20:01:52 +02:00
c7ea347e2b feat(qdrant): optimization — payload indexes, HNSW tuning, search params (v1)
Inspection findings:
- _ensure_collection() created collections with bare VectorParams (no HNSW/optimizer config)
- _do_search() had no SearchParams — used Qdrant defaults (ef often ~100, no indexed_only)
- No payload index management at all — filtered searches scanned unindexed fields every time
- collection_info() returned minimal data — impossible to inspect production state
- No way to create/ensure payload indexes via the API

Changes — qdrant/main.py:
- Add SEARCH_HNSW_EF env var (default 128, above Qdrant default for better recall)
- _ensure_collection(): configure HnswConfigDiff(m=16, ef_construct=200, on_disk=False)
  and OptimizersConfigDiff(indexing_threshold=20000, default_segment_number=4) on creation
- _do_search(): use SearchParams(hnsw_ef, exact, indexed_only) on every query
- SearchUrlRequest + SearchVectorRequest: expose hnsw_ef, exact, indexed_only per request
- collection_info(): expand to full HNSW/optimizer/quantization/segment/payload_schema detail
- GET  /collections/{name}/indexes     — list all payload indexes
- POST /collections/{name}/indexes     — create a single payload index
- POST /collections/{name}/ensure-indexes — idempotent bulk index creation (skip existing)
- POST /collections/{name}/configure   — apply HNSW/optimizer changes to existing collections

Changes — gateway/main.py:
- Expose the 4 new qdrant-svc endpoints under /vectors/collections/{name}/...

Changes — docker-compose.yml:
- Add SEARCH_HNSW_EF=128 to qdrant-svc environment

Critical usage note for existing collections:
  After deploying, call POST /vectors/collections/images/ensure-indexes with the
  payload fields actually used in filter_metadata (is_public, category_id, etc.)
  to add missing indexes. This is the highest-impact single action for filtered search.
2026-03-31 19:58:47 +02:00
5 changed files with 575 additions and 18 deletions

View File

@@ -1,6 +1,6 @@
# Skinbase Vision Stack (CLIP + BLIP + YOLO + Qdrant) Dockerized FastAPI
# Skinbase Vision Stack (CLIP + BLIP + YOLO + Qdrant + Card Renderer) Dockerized FastAPI
This repository provides **four standalone vision services** (CLIP / BLIP / YOLO / Qdrant)
This repository provides **five standalone vision services** (CLIP / BLIP / YOLO / Qdrant / Card Renderer)
and a **Gateway API** that can call them individually or together.
## Services & Ports
@@ -11,6 +11,7 @@ and a **Gateway API** that can call them individually or together.
- `yolo`: internal only
- `qdrant`: vector DB (port `6333` exposed for direct access)
- `qdrant-svc`: internal Qdrant API wrapper
- `card-renderer`: internal card rendering service
## Run
@@ -129,14 +130,17 @@ curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/up
```bash
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}'
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5,"filter_metadata":{"is_public":true}}'
```
Optional search parameters: `hnsw_ef` (int), `exact` (bool), `indexed_only` (bool), `score_threshold` (float), `filter_metadata` (object).
### Search similar images by file upload
```bash
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/search/file \
-F "file=@/path/to/image.webp" \
-F "limit=5"
-F "limit=5" \
-F 'filter_metadata_json={"is_public":true}'
```
### List collections
@@ -149,6 +153,38 @@ curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collection
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images
```
### Full diagnostic inspect
```bash
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/inspect
```
Returns HNSW config, optimizer config, quantization, segment count, payload index coverage, and RAM estimate for every collection.
### Payload index management
```bash
# List indexes
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images/indexes
# Create a single index
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/indexes \
-H "Content-Type: application/json" \
-d '{"field":"is_public","type":"bool"}'
# Ensure multiple indexes (idempotent)
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/ensure-indexes \
-H "Content-Type: application/json" \
-d '{"fields":[{"field":"is_public","type":"bool"},{"field":"category_id","type":"integer"}]}'
```
Supported index types: `keyword`, `integer`, `float`, `bool`, `geo`, `datetime`, `text`, `uuid`.
### Collection configuration (HNSW / optimizer / quantization)
```bash
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/collections/images/configure \
-H "Content-Type: application/json" \
-d '{"hnsw_m":16,"hnsw_ef_construct":200,"indexing_threshold":20000,"quantization_type":"int8"}'
```
### Delete points
```bash
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/delete \
@@ -158,6 +194,38 @@ curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/vectors/de
If you let the wrapper generate a UUID, use the returned `id` value for later `get`, `search`, or `delete` operations.
## Card Renderer
### List available templates
```bash
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/cards/templates
```
### Render a card from a URL
```bash
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title","username":"@artist","template":"nova-artwork-v1"}'
```
Returns binary image bytes (WebP by default).
### Render a card from a file upload
```bash
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/file \
-F "file=@/path/to/image.webp" \
-F "title=Artwork Title" \
-F "username=@artist" \
-F "template=nova-artwork-v1"
```
### Get card layout metadata (no image rendered)
```bash
curl -H "X-API-Key: <your-api-key>" -X POST https://vision.klevze.net/cards/render/meta \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title"}'
```
## Notes
- This is a **starter scaffold**. Models are loaded at service startup.

144
USAGE.md
View File

@@ -4,7 +4,7 @@ This document explains how to run and use the Skinbase Vision Stack (Gateway + C
## Overview
- Services: `gateway`, `clip`, `blip`, `yolo`, `qdrant`, `qdrant-svc` (FastAPI each, except `qdrant` which is the official Qdrant DB).
- Services: `gateway`, `clip`, `blip`, `yolo`, `qdrant`, `qdrant-svc`, `card-renderer` (FastAPI each, except `qdrant` which is the official Qdrant DB).
- Gateway is the public API endpoint; the other services are internal.
## Model overview
@@ -17,6 +17,8 @@ This document explains how to run and use the Skinbase Vision Stack (Gateway + C
- **Qdrant**: High-performance vector similarity search engine. Stores CLIP image embeddings and enables reverse image search (find similar images). The `qdrant-svc` wrapper auto-embeds images via CLIP before upserting.
- **Card Renderer**: Generates branded social-card images (e.g. Open Graph previews) from artwork images. Applies smart center-weighted cropping, gradient overlays, title/username/tag text, and an optional logo. Returns binary image bytes (WebP by default). Template: `nova-artwork-v1`.
## Prerequisites
- Docker Desktop (with `docker compose`) or a Docker environment.
@@ -219,8 +221,11 @@ Parameters:
- `url` (required): query image URL.
- `limit` (optional, default 5): number of results.
- `score_threshold` (optional): minimum cosine similarity (0.01.0).
- `filter_metadata` (optional): filter results by metadata, e.g. `{"category":"wallpaper"}`.
- `filter_metadata` (optional): filter results by payload fields, e.g. `{"is_public":true,"category_id":3}`.
- `collection` (optional): collection to search.
- `hnsw_ef` (optional, int): override the HNSW ef parameter at query time. Higher = better recall, slightly more latency.
- `exact` (optional, bool, default false): brute-force exact search. Avoid on large collections.
- `indexed_only` (optional, bool, default false): restrict search to fully indexed segments only. Useful during bulk ingest.
Return: list of `{"id", "score", "metadata"}` sorted by similarity.
@@ -230,16 +235,19 @@ Return: list of `{"id", "score", "metadata"}` sorted by similarity.
curl -X POST https://vision.klevze.net/vectors/search/file \
-H "X-API-Key: <your-api-key>" \
-F "file=@/path/to/image.webp" \
-F "limit=5"
-F "limit=5" \
-F 'filter_metadata_json={"is_public":true}'
```
All URL search parameters are available as form fields; use `filter_metadata_json` (JSON string) for filters.
#### Search by pre-computed vector
```bash
curl -X POST https://vision.klevze.net/vectors/search/vector \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{"vector":[0.1,0.2,...],"limit":5}'
-d '{"vector":[0.1,0.2,...],"limit":5,"hnsw_ef":128}'
```
#### Collection management
@@ -267,6 +275,67 @@ Delete a collection:
curl -H "X-API-Key: <your-api-key>" -X DELETE https://vision.klevze.net/vectors/collections/my_collection
```
#### Full diagnostic inspect
Returns HNSW config, optimizer config, quantization, segment count, payload index coverage percentages, and RAM footprint estimate for every collection.
```bash
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/inspect
```
#### Payload index management
Payload indexes are critical for fast filtered vector search. Always create indexes for fields used in `filter_metadata` filters.
```bash
# List existing indexes
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/collections/images/indexes
# Create a single index
curl -X POST https://vision.klevze.net/vectors/collections/images/indexes \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{"field":"is_public","type":"bool"}'
# Ensure multiple indexes exist (idempotent — safe to run multiple times)
curl -X POST https://vision.klevze.net/vectors/collections/images/ensure-indexes \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{"fields":[{"field":"is_public","type":"bool"},{"field":"is_deleted","type":"bool"},{"field":"category_id","type":"integer"},{"field":"user_id","type":"keyword"}]}'
```
Supported index types: `keyword`, `integer`, `float`, `bool`, `geo`, `datetime`, `text`, `uuid`.
#### Collection configuration (HNSW / optimizer / quantization)
Updates HNSW, optimizer, or scalar quantization settings on an existing collection without data loss. HNSW graph and segment changes apply to newly created segments.
```bash
curl -X POST https://vision.klevze.net/vectors/collections/images/configure \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"hnsw_m": 16,
"hnsw_ef_construct": 200,
"hnsw_on_disk": false,
"indexing_threshold": 20000,
"default_segment_number": 4,
"quantization_type": "int8",
"quantization_quantile": 0.99,
"quantization_always_ram": true
}'
```
Parameters:
- `hnsw_m` (int, 464): edges per node in the HNSW graph.
- `hnsw_ef_construct` (int, 101000): ef during index construction.
- `hnsw_on_disk` (bool): store HNSW graph on disk (saves RAM, slightly slower queries).
- `indexing_threshold` (int): minimum vector changes before a segment is indexed.
- `default_segment_number` (int, 132): target segment count for parallelism.
- `quantization_type` (string, `"int8"` or null): enable scalar quantization (~4× RAM reduction).
- `quantization_quantile` (float, 0.51.0, default 0.99): calibration quantile.
- `quantization_always_ram` (bool, default true): keep quantized vectors in RAM.
#### Delete points
```bash
@@ -290,6 +359,67 @@ If the wrapper had to replace your string `id` with a generated UUID, the origin
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/vectors/points/by-original-id/img-001
```
## Card Renderer
The card renderer generates branded social-card images from artwork photos. It applies smart center-weighted cropping, a gradient overlay, title/subtitle/username/category text, optional tags, and an optional logo.
Default output: 1200×630 WebP (`nova-artwork-v1` template).
### List available templates
```bash
curl -H "X-API-Key: <your-api-key>" https://vision.klevze.net/cards/templates
```
### Render a card from a URL
```bash
curl -X POST https://vision.klevze.net/cards/render \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{
"url": "https://files.skinbase.org/img/aa/bb/cc/md.webp",
"title": "Artwork Title",
"subtitle": "Optional subtitle",
"username": "@artist",
"category": "Digital Art",
"tags": ["surreal", "landscape"],
"template": "nova-artwork-v1",
"width": 1200,
"height": 630,
"output": "webp",
"quality": 90,
"show_logo": true
}'
```
Returns binary image bytes with `Content-Type: image/webp`.
### Render a card from a file upload
```bash
curl -X POST https://vision.klevze.net/cards/render/file \
-H "X-API-Key: <your-api-key>" \
-F "file=@/path/to/image.webp" \
-F "title=Artwork Title" \
-F "username=@artist" \
-F "template=nova-artwork-v1" \
-F "show_logo=true"
```
Returns binary image bytes.
### Get card layout metadata (no image rendered)
```bash
curl -X POST https://vision.klevze.net/cards/render/meta \
-H "X-API-Key: <your-api-key>" \
-H "Content-Type: application/json" \
-d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title"}'
```
Returns crop coordinates and layout data without producing an image.
## Request/Response notes
- For URL requests use `Content-Type: application/json`.
@@ -340,9 +470,5 @@ uvicorn main:app --host 0.0.0.0 --port 8000
- `gateway/` — gateway FastAPI server.
- `clip/`, `blip/`, `yolo/` — service implementations and Dockerfiles.
- `qdrant/` — Qdrant API wrapper service (FastAPI).
- `card-renderer/` — card rendering service (FastAPI).
- `common/` — shared helpers (e.g., image I/O).
---
If you want, I can merge these same contents into the project `README.md`,
create a Postman collection, or add example response schemas for each endpoint.

View File

@@ -77,6 +77,7 @@ services:
- CLIP_URL=http://clip:8000
- COLLECTION_NAME=images
- VECTOR_DIM=512
- SEARCH_HNSW_EF=128
depends_on:
qdrant:
condition: service_healthy

View File

@@ -243,13 +243,21 @@ async def vectors_search_file(
limit: int = Form(5),
score_threshold: Optional[float] = Form(None),
collection: Optional[str] = Form(None),
hnsw_ef: Optional[int] = Form(None),
exact: bool = Form(False),
indexed_only: bool = Form(False),
filter_metadata_json: Optional[str] = Form(None),
):
data = await file.read()
fields: Dict[str, Any] = {"limit": int(limit)}
fields: Dict[str, Any] = {"limit": int(limit), "exact": exact, "indexed_only": indexed_only}
if score_threshold is not None:
fields["score_threshold"] = float(score_threshold)
if collection is not None:
fields["collection"] = collection
if hnsw_ef is not None:
fields["hnsw_ef"] = int(hnsw_ef)
if filter_metadata_json is not None:
fields["filter_metadata_json"] = filter_metadata_json
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
return await _post_file(client, f"{QDRANT_SVC_URL}/search/file", data, fields)
@@ -284,6 +292,13 @@ async def vectors_collection_info(name: str):
return await _get_json(client, f"{QDRANT_SVC_URL}/collections/{name}")
@app.get("/vectors/inspect")
async def vectors_inspect():
"""Full diagnostic summary for all Qdrant collections (HNSW, optimizer, payload indexes, RAM estimate)."""
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
return await _get_json(client, f"{QDRANT_SVC_URL}/inspect")
@app.delete("/vectors/collections/{name}")
async def vectors_delete_collection(name: str):
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
@@ -416,3 +431,33 @@ async def cards_render_meta(payload: Dict[str, Any]):
"""Return crop and layout metadata for a card render (no image produced)."""
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
return await _post_json(client, f"{CARD_RENDERER_URL}/render/meta", payload)
# ---- Qdrant administration endpoints (index management + collection config) ----
@app.get("/vectors/collections/{name}/indexes")
async def vectors_collection_indexes(name: str):
"""List payload indexes for a collection."""
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
return await _get_json(client, f"{QDRANT_SVC_URL}/collections/{name}/indexes")
@app.post("/vectors/collections/{name}/indexes")
async def vectors_create_payload_index(name: str, payload: Dict[str, Any]):
"""Create a payload index on a field in a collection."""
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
return await _post_json(client, f"{QDRANT_SVC_URL}/collections/{name}/indexes", payload)
@app.post("/vectors/collections/{name}/ensure-indexes")
async def vectors_ensure_indexes(name: str, payload: Dict[str, Any]):
"""Idempotently ensure payload indexes exist for a list of fields."""
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
return await _post_json(client, f"{QDRANT_SVC_URL}/collections/{name}/ensure-indexes", payload)
@app.post("/vectors/collections/{name}/configure")
async def vectors_configure_collection(name: str, payload: Dict[str, Any]):
"""Update HNSW and optimizer configuration for a collection."""
async with httpx.AsyncClient(timeout=VISION_TIMEOUT) as client:
return await _post_json(client, f"{QDRANT_SVC_URL}/collections/{name}/configure", payload)

View File

@@ -16,6 +16,12 @@ from qdrant_client.models import (
Filter,
FieldCondition,
MatchValue,
HnswConfigDiff,
OptimizersConfigDiff,
SearchParams,
PayloadSchemaType,
ScalarQuantizationConfig,
ScalarType,
)
# ---------------------------------------------------------------------------
@@ -27,6 +33,8 @@ QDRANT_PORT = int(os.getenv("QDRANT_PORT", "6333"))
CLIP_URL = os.getenv("CLIP_URL", "http://clip:8000")
COLLECTION_NAME = os.getenv("COLLECTION_NAME", "images")
VECTOR_DIM = int(os.getenv("VECTOR_DIM", "512"))
# hnsw_ef at query time: higher = better recall, slightly more latency (Qdrant default ~100)
SEARCH_HNSW_EF = int(os.getenv("SEARCH_HNSW_EF", "128"))
app = FastAPI(title="Skinbase Qdrant Service", version="1.0.0")
client: QdrantClient = None # type: ignore[assignment]
@@ -44,12 +52,21 @@ def startup():
def _ensure_collection():
"""Create the default collection if it does not exist yet."""
"""Create the default collection with production-friendly defaults if it does not exist yet."""
collections = [c.name for c in client.get_collections().collections]
if COLLECTION_NAME not in collections:
client.create_collection(
collection_name=COLLECTION_NAME,
vectors_config=VectorParams(size=VECTOR_DIM, distance=Distance.COSINE),
hnsw_config=HnswConfigDiff(
m=16,
ef_construct=200, # higher than default 100 = better index quality
on_disk=False, # keep HNSW graph in RAM for fast traversal
),
optimizers_config=OptimizersConfigDiff(
indexing_threshold=20000, # start indexing after 20k accumulated vectors
default_segment_number=4, # parallelism-friendly segment count
),
)
@@ -77,6 +94,9 @@ class SearchUrlRequest(BaseModel):
score_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0)
collection: Optional[str] = None
filter_metadata: Dict[str, Any] = Field(default_factory=dict)
hnsw_ef: Optional[int] = Field(default=None, ge=1, le=512, description="Override ef at query time. Higher = better recall, slightly higher latency.")
exact: bool = Field(default=False, description="Brute-force exact search. Avoid on large collections.")
indexed_only: bool = Field(default=False, description="Search only fully indexed segments. Useful during bulk ingest.")
class SearchVectorRequest(BaseModel):
@@ -85,6 +105,9 @@ class SearchVectorRequest(BaseModel):
score_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0)
collection: Optional[str] = None
filter_metadata: Dict[str, Any] = Field(default_factory=dict)
hnsw_ef: Optional[int] = Field(default=None, ge=1, le=512)
exact: bool = False
indexed_only: bool = False
class DeleteRequest(BaseModel):
@@ -189,6 +212,79 @@ def health():
return {"status": "error", "detail": str(e)}
@app.get("/inspect")
def inspect():
"""Return a full diagnostic summary for every collection.
Covers: vector counts, segment counts, HNSW config, optimizer config,
quantization, payload indexes and their coverage. Designed for production
health checks and the Qdrant optimization workflow.
"""
try:
all_collections = client.get_collections().collections
except Exception as exc:
return {"status": "error", "detail": str(exc)}
result = {}
for col_desc in all_collections:
name = col_desc.name
try:
info = client.get_collection(name)
cfg = info.config
hnsw = cfg.hnsw_config
opt = cfg.optimizer_config
quant = cfg.quantization_config
params = cfg.params
# Estimate raw RAM footprint: vectors * dim * 4 bytes * 1.5 safety factor
vec_count = info.vectors_count or 0
vec_dim = (
params.vectors.size
if hasattr(params.vectors, "size")
else VECTOR_DIM
)
ram_estimate_mb = round(vec_count * vec_dim * 4 * 1.5 / 1_048_576, 1)
result[name] = {
"status": info.status.value if info.status else None,
"optimizer_status": str(info.optimizer_status) if info.optimizer_status else None,
"vectors_count": vec_count,
"indexed_vectors_count": info.indexed_vectors_count,
"points_count": info.points_count,
"segments_count": info.segments_count,
"ram_estimate_mb": ram_estimate_mb,
"hnsw": {
"m": hnsw.m,
"ef_construct": hnsw.ef_construct,
"on_disk": hnsw.on_disk,
"full_scan_threshold": hnsw.full_scan_threshold,
"max_indexing_threads": hnsw.max_indexing_threads,
} if hnsw else None,
"optimizer": {
"indexing_threshold": opt.indexing_threshold,
"default_segment_number": opt.default_segment_number,
"max_segment_size": opt.max_segment_size,
"memmap_threshold": opt.memmap_threshold,
"flush_interval_sec": opt.flush_interval_sec,
} if opt else None,
"quantization": str(quant) if quant else None,
"payload_indexes": {
k: {
"type": v.data_type.value if hasattr(v.data_type, "value") else str(v.data_type),
"points": v.points,
"coverage_pct": round(v.points / max(vec_count, 1) * 100, 1),
}
for k, v in (info.payload_schema or {}).items()
},
"payload_index_count": len(info.payload_schema or {}),
"search_hnsw_ef": SEARCH_HNSW_EF,
}
except Exception as exc:
result[name] = {"error": str(exc)}
return {"collections": result, "total": len(result)}
# ---------------------------------------------------------------------------
# Collection management
# ---------------------------------------------------------------------------
@@ -204,9 +300,13 @@ def create_collection(req: CollectionRequest):
if req.name in collections:
raise HTTPException(409, f"Collection '{req.name}' already exists")
# Apply the same production defaults as _ensure_collection so all
# collections start with tuned HNSW and optimizer settings.
client.create_collection(
collection_name=req.name,
vectors_config=VectorParams(size=req.vector_dim, distance=dist),
hnsw_config=HnswConfigDiff(m=16, ef_construct=200, on_disk=False),
optimizers_config=OptimizersConfigDiff(indexing_threshold=20000, default_segment_number=4),
)
return {"created": req.name, "vector_dim": req.vector_dim, "distance": req.distance}
@@ -221,11 +321,40 @@ def list_collections():
def collection_info(name: str):
try:
info = client.get_collection(name)
cfg = info.config
hnsw = cfg.hnsw_config
opt = cfg.optimizer_config
quant = cfg.quantization_config
return {
"name": name,
"vectors_count": info.vectors_count,
"indexed_vectors_count": info.indexed_vectors_count,
"points_count": info.points_count,
"segments_count": info.segments_count,
"status": info.status.value if info.status else None,
"optimizer_status": str(info.optimizer_status) if info.optimizer_status else None,
"hnsw": {
"m": hnsw.m,
"ef_construct": hnsw.ef_construct,
"on_disk": hnsw.on_disk,
"full_scan_threshold": hnsw.full_scan_threshold,
"max_indexing_threads": hnsw.max_indexing_threads,
} if hnsw else None,
"optimizer": {
"indexing_threshold": opt.indexing_threshold,
"default_segment_number": opt.default_segment_number,
"max_segment_size": opt.max_segment_size,
"memmap_threshold": opt.memmap_threshold,
"flush_interval_sec": opt.flush_interval_sec,
} if opt else None,
"quantization": str(quant) if quant else None,
"payload_schema": {
k: {
"type": v.data_type.value if hasattr(v.data_type, "value") else str(v.data_type),
"points": v.points,
}
for k, v in (info.payload_schema or {}).items()
},
}
except Exception as e:
raise HTTPException(404, str(e))
@@ -325,7 +454,7 @@ def upsert_vector(req: UpsertVectorRequest):
async def search_url(req: SearchUrlRequest):
"""Embed an image by URL via CLIP, then search Qdrant for similar vectors."""
vector = await _embed_url(req.url)
return _do_search(vector, req.limit, req.score_threshold, req.collection, req.filter_metadata)
return _do_search(vector, req.limit, req.score_threshold, req.collection, req.filter_metadata, req.hnsw_ef, req.exact, req.indexed_only)
@app.post("/search/file")
@@ -334,17 +463,28 @@ async def search_file(
limit: int = Form(5),
score_threshold: Optional[float] = Form(None),
collection: Optional[str] = Form(None),
hnsw_ef: Optional[int] = Form(None),
exact: bool = Form(False),
indexed_only: bool = Form(False),
filter_metadata_json: Optional[str] = Form(None),
):
"""Embed an uploaded image via CLIP, then search Qdrant for similar vectors."""
import json
filter_metadata: Dict[str, Any] = {}
if filter_metadata_json:
try:
filter_metadata = json.loads(filter_metadata_json)
except json.JSONDecodeError:
raise HTTPException(400, "filter_metadata_json must be valid JSON")
data = await file.read()
vector = await _embed_bytes(data)
return _do_search(vector, int(limit), score_threshold, collection, {})
return _do_search(vector, int(limit), score_threshold, collection, filter_metadata, hnsw_ef, exact, indexed_only)
@app.post("/search/vector")
def search_vector(req: SearchVectorRequest):
"""Search Qdrant using a pre-computed vector."""
return _do_search(req.vector, req.limit, req.score_threshold, req.collection, req.filter_metadata)
return _do_search(req.vector, req.limit, req.score_threshold, req.collection, req.filter_metadata, req.hnsw_ef, req.exact, req.indexed_only)
def _do_search(
@@ -353,9 +493,13 @@ def _do_search(
score_threshold: Optional[float],
collection: Optional[str],
filter_metadata: Dict[str, Any],
hnsw_ef: Optional[int] = None,
exact: bool = False,
indexed_only: bool = False,
):
col = _col(collection)
qfilter = _build_filter(filter_metadata)
ef = hnsw_ef if hnsw_ef is not None else SEARCH_HNSW_EF
results = client.query_points(
collection_name=col,
@@ -363,6 +507,7 @@ def _do_search(
limit=limit,
score_threshold=score_threshold,
query_filter=qfilter,
search_params=SearchParams(hnsw_ef=ef, exact=exact, indexed_only=indexed_only),
)
hits = []
@@ -438,3 +583,175 @@ def get_point_by_original_id(original_id: str, collection: Optional[str] = None)
raise
except Exception as e:
raise HTTPException(404, str(e))
# ---------------------------------------------------------------------------
# Payload index management
# ---------------------------------------------------------------------------
_SCHEMA_TYPE_MAP: Dict[str, PayloadSchemaType] = {
t.value: t for t in PayloadSchemaType
}
def _resolve_schema_type(type_str: str) -> PayloadSchemaType:
schema = _SCHEMA_TYPE_MAP.get(type_str.lower())
if schema is None:
raise HTTPException(400, f"Unknown index type '{type_str}'. Valid: {', '.join(_SCHEMA_TYPE_MAP)}")
return schema
class PayloadIndexRequest(BaseModel):
field: str
type: str = Field(default="keyword", description="keyword | integer | float | bool | geo | datetime | text | uuid")
collection: Optional[str] = None
class EnsureIndexesRequest(BaseModel):
"""List of field specs, each with 'field' and optional 'type' keys."""
fields: List[Dict[str, str]]
collection: Optional[str] = None
@app.get("/collections/{name}/indexes")
def collection_indexes(name: str):
"""List all payload indexes for a collection."""
try:
info = client.get_collection(name)
schema = info.payload_schema or {}
return {
"collection": name,
"indexes": {
k: {
"type": v.data_type.value if hasattr(v.data_type, "value") else str(v.data_type),
"points": v.points,
}
for k, v in schema.items()
},
"count": len(schema),
}
except Exception as e:
raise HTTPException(404, str(e))
@app.post("/collections/{name}/indexes")
def create_index(name: str, req: PayloadIndexRequest):
"""Create a payload index on a single field."""
col = req.collection or name
schema = _resolve_schema_type(req.type)
try:
client.create_payload_index(
collection_name=col,
field_name=req.field,
field_schema=schema,
)
return {"collection": col, "field": req.field, "type": req.type, "status": "created"}
except Exception as e:
raise HTTPException(500, str(e))
@app.post("/collections/{name}/ensure-indexes")
def ensure_indexes(name: str, req: EnsureIndexesRequest):
"""Idempotently ensure payload indexes exist for a list of fields.
Skips fields that are already indexed; only creates the missing ones.
Example body: {"fields": [{"field": "is_public", "type": "bool"}, {"field": "category_id", "type": "integer"}]}
"""
col = req.collection or name
try:
info = client.get_collection(col)
except Exception as e:
raise HTTPException(404, str(e))
existing = set(info.payload_schema.keys()) if info.payload_schema else set()
created: List[str] = []
skipped: List[str] = []
for field_spec in req.fields:
field = field_spec.get("field")
type_str = field_spec.get("type", "keyword")
if not field:
raise HTTPException(400, "Each field spec must include a 'field' key")
if field in existing:
skipped.append(field)
continue
schema = _resolve_schema_type(type_str)
try:
client.create_payload_index(
collection_name=col,
field_name=field,
field_schema=schema,
)
created.append(field)
except Exception as exc:
raise HTTPException(500, f"Failed to index '{field}': {exc}")
return {"collection": col, "created": created, "skipped": skipped}
# ---------------------------------------------------------------------------
# Collection HNSW + optimizer configuration
# ---------------------------------------------------------------------------
class CollectionConfigRequest(BaseModel):
hnsw_m: Optional[int] = Field(default=None, ge=4, le=64, description="Edges per node in the HNSW graph.")
hnsw_ef_construct: Optional[int] = Field(default=None, ge=10, le=1000, description="ef during index construction. Changes apply to new segments only.")
hnsw_on_disk: Optional[bool] = Field(default=None, description="Store HNSW graph on disk (saves RAM, slightly slower queries).")
indexing_threshold: Optional[int] = Field(default=None, ge=0, description="Min payload changes before a segment is indexed.")
default_segment_number: Optional[int] = Field(default=None, ge=1, le=32, description="Target number of segments for parallelism.")
# Scalar quantization — reduces RAM ~4x, often speeds up search on large collections.
# Set quantization_type='int8' to enable. Use always_ram=True to keep quantized
# vectors in RAM (recommended on VPS with limited memory but fast disk).
quantization_type: Optional[str] = Field(default=None, description="Enable scalar quantization: 'int8'. Set to null to keep current setting.")
quantization_quantile: float = Field(default=0.99, ge=0.5, le=1.0, description="Fraction of vectors used to calibrate quantization range (0.99 recommended).")
quantization_always_ram: bool = Field(default=True, description="Keep quantized vectors in RAM even when raw vectors are on disk.")
@app.post("/collections/{name}/configure")
def configure_collection(name: str, req: CollectionConfigRequest):
"""Apply HNSW and optimizer configuration updates to an existing collection.
Changes are applied in-place without data loss or re-ingestion.
Note: hnsw_m and hnsw_ef_construct only affect newly created segments.
"""
hnsw_kwargs = {k: v for k, v in {
"m": req.hnsw_m,
"ef_construct": req.hnsw_ef_construct,
"on_disk": req.hnsw_on_disk,
}.items() if v is not None}
opt_kwargs = {k: v for k, v in {
"indexing_threshold": req.indexing_threshold,
"default_segment_number": req.default_segment_number,
}.items() if v is not None}
# Build optional scalar quantization config
quant_config = None
if req.quantization_type is not None:
if req.quantization_type.lower() != "int8":
raise HTTPException(400, f"Unsupported quantization_type '{req.quantization_type}'. Only 'int8' is supported.")
quant_config = ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=req.quantization_quantile,
always_ram=req.quantization_always_ram,
)
if not hnsw_kwargs and not opt_kwargs and quant_config is None:
raise HTTPException(400, "No configuration fields provided")
try:
client.update_collection(
collection_name=name,
hnsw_config=HnswConfigDiff(**hnsw_kwargs) if hnsw_kwargs else None,
optimizers_config=OptimizersConfigDiff(**opt_kwargs) if opt_kwargs else None,
quantization_config=quant_config,
)
return {
"collection": name,
"status": "updated",
"hnsw_changes": hnsw_kwargs,
"optimizer_changes": opt_kwargs,
"quantization": {"type": req.quantization_type, "quantile": req.quantization_quantile, "always_ram": req.quantization_always_ram} if quant_config else None,
}
except Exception as exc:
raise HTTPException(500, str(exc))