WordFor Benchmark

Measures scoring speed and end-to-end query latency in your browser.

Model Quality (67-query test set)

Offline evaluation on pre-built embeddings. Updated each time compare_eval.py is run.

Method	Mode	MRR	Hit@1	Hit@6	Notes
Potion base int4	Lite	0.4931	26/67	41/67	distilled-mxbai base, int4 scoring
Potion fine-tuned int4	Lite	0.5566	32/67	43/67	distilled-mxbai fine-tuned, int4 scoring
Full pure binary (ITQ)	Full	0.5835	33/67	46/67	mdbr-leaf-mt, 1-bit binary ITQ, 48 bytes/entry
Full pure int2	Full	0.5837	32/67	48/67	mdbr-leaf-mt, pure int2, 96 bytes/entry
Full pure int3	Full	0.6650	38/67	55/67	mdbr-leaf-mt, pure int3, 144 bytes/entry
Full pure int4	Full	0.6442	36/67	53/67	mdbr-leaf-mt, pure int4, 192 bytes/entry
Full pure int8	Full	0.6408	36/67	55/67	mdbr-leaf-mt, pure int8, 384 bytes/entry
Full binary+int3 rerank	Full	0.6617	38/67	54/67	mdbr-leaf-mt, binary+int3 rerank, 192 bytes/entry (desktop)
Full binary+int4 rerank	Full	0.6430	36/67	53/67	mdbr-leaf-mt, binary+int4 rerank, 240 bytes/entry

Loading data files...

Back to WordFor

Scoring Performance (per-query, all entries)

Raw vector scoring speed for each quantization format — no model inference. 20 iterations, random query vectors. Use this alongside the offline MRR table above to choose the best format.

End-to-End Latency

Full pipeline: model load + tokenization + inference + scoring + ranking.

Test	Status	Load (s)	Query avg (ms)	Notes

WordFor Benchmark

Model Quality (67-query test set)

Scoring Performance (per-query, all entries)

End-to-End Latency

Log