< Previous
Next >

: At work, I've been doing a lot of testing with randomly generated data. I'm generating the data by picking words from /usr/dict/words. /usr/dict/words contains many archaic words (due to its primary source (I think): a 1911 edition of Roget's) and many archaic neologisms like "Microvaxes" and "BITNET". There are some words which appear only in lists of words, articles about spelling bees, and pieces of random text generated from the Unix word list.

One such word is "Boswellize". It appears to come from a 1911 encyclopedia entry (another source for /usr/dict/words), and its semantics have never been invoked in a Google-viewable sentence since. Even the encyclopedia entry, and this very weblog entry, only treat "Boswellize" as a word and don't actually use it to convey an idea. I think this word needs to go. I like words and all, but is it really neccessary to have "Boswellize" as an official word? If anyone were to actually say it, its meaning would be obvious, just as it would be if I said "to Clintonize" or "to McDonaldsize". Why keep the word around when it won't earn its keep?

Some would say that what I desire has already been accomplished, that "Boswellize" has been eliminated from the marketplace of words as measured by its pitiful performance on Google. But it is clear that such people, while well-meaning, are deluded reactionaries. The word is still in /usr/dict/words, and it will be until a more recent encyclopedia than the one with that entry about Boswell passes into the public domain. This, I argue, is the true tragedy of copyright extension.

By the way, FRELI is a word list with part-of-speech information: useful if you want your random data to make some kind of grammatical sense.


[Main] [Edit]

Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.