WordFor Benchmark

Measures scoring speed and end-to-end query latency in your browser.

Model Quality (67-query test set)

Offline evaluation on pre-built embeddings. Updated each time compare_eval.py is run.

MethodModeMRRHit@1Hit@6Notes
Potion base int4Lite0.510428/6741/67distilled-mxbai base, int4 scoring
Potion fine-tuned int4Lite0.577635/6743/67distilled-mxbai fine-tuned, int4 scoring
Full pure binary (ITQ)Full0.633736/6755/67mdbr-leaf-mt, 1-bit binary ITQ, 48 bytes/entry
Full pure int2Full0.611135/6747/67mdbr-leaf-mt, pure int2, 96 bytes/entry
Full pure int3Full0.635135/6753/67mdbr-leaf-mt, pure int3, 144 bytes/entry
Full pure int4Full0.634936/6753/67mdbr-leaf-mt, pure int4, 192 bytes/entry
Full pure int8Full0.643136/6755/67mdbr-leaf-mt, pure int8, 384 bytes/entry
Full binary+int3 rerankFull0.633935/6753/67mdbr-leaf-mt, binary+int3 rerank, 192 bytes/entry (desktop)
Full binary+int4 rerankFull0.635336/6753/67mdbr-leaf-mt, binary+int4 rerank, 240 bytes/entry
Loading data files...
Back to WordFor

Scoring Performance (per-query, all entries)

Raw vector scoring speed for each quantization format — no model inference. 20 iterations, random query vectors. Use this alongside the offline MRR table above to choose the best format.

End-to-End Latency

Full pipeline: model load + tokenization + inference + scoring + ranking.

TestStatusLoad (s)Query avg (ms)Notes

Log