WWordFor

How we use public-domain dictionaries without copying restricted data

The source policy behind WordFor

WordFor's results come from real dictionaries. But "real dictionaries" includes a messy mix of licenses — some truly public domain, some copyleft, some proprietary. To keep the visible product clean and redistributable, WordFor sorts every source into three lanes and is strict about which lane is allowed to put text in front of you.

Lane 1: the clean public-domain core (visible)

The words and definitions you actually see come only from sources that are public domain or openly licensed for redistribution:

Each visible entry carries a source label, and an automated audit fails the build if any restricted text ever lands in this lane.

Lane 2: build-time-only signals (never shown)

Some excellent resources are copyleft (CC-BY-SA) or otherwise unsuitable for redistribution. WordFor still learns from them at build time — for ranking and quality scoring — but never copies their text into the product:

The distinction is deliberate: a ranking signal derived at build time is not the same as shipping someone else's copyrighted text.

Lane 3: optional / research-only packs (blocked from core)

Sources that are GPL (the full GCIDE), proprietary, edition-unverified, or jurisdiction- sensitive are kept out of the visible core entirely. Examples: the full GPL GCIDE, aggregator sites, and historical works whose specific scan/edition we haven't verified as public domain. Some may later ship as clearly-labelled optional packs, but never silently in the core.

Why this matters

For how those sources turn into a ranked list, see how WordFor ranks candidate words.

← Open WordFor