A few days ago, John McGrath, Wordnik’s Director of Product Development, sent me a link to the preview version of Wordnik’s new thesaurus feature.  Wordnik, if you’re not familiar with it, is an online dictionary that integrates information from traditional dictionaries and online usage to give a more complete picture of a word’s meaning.  Merging these supervised and unsupervised data sources is of course a brilliant idea, and I think within a few years it will become a necessary part of any online dictionary.

I decided to test the Wordnik thesaurus with two types of words that often aren’t adequately represented in traditional thesauruses: colloquial phrasal verbs and insults.  The particular colloquial verb I tested was flesh out, which tends to pop into my head when I’m writing academically, as I want to first give an overview of the point I’m arguing, and then flesh it out.  Sadly, I’ve never found a synonym for flesh out that befits the tone of academic writing. Many thesauruses, even online ones, don’t list flesh out, and those that do haven’t given me enough alternatives to find a good one.  So I tried looking up flesh out on Wordnik, and I have to say it performed better than I expected.  It offered a few words that were pretty good equivalents (detail, fill in, round out, exposit), and, as would be expected from a semisupervised method, a few that were somewhat off (instance, set forth).  Still nothing that really fits my needs, but I’m not sure the word I’d be looking for even exists. (If you have any suggestions for a flesh out equivalent, let me know.)

The second test word was a common insult I employ in writing: imbecile.  The problem is that it’s so general; I often have situations where I want to make a quite specific insult, not merely to point out that someone is an imbecile, but also to specify the type of their imbecility (conscious ignorance, malicious misinformation, insufficient expertise, etc.).  Ever since I realized that “The Big Book of Being Rude” that I purchased on clearance at Half Price Books was woefully lacking in specific insults, I’ve been looking for a new source. I was hoping the thesaurus would suggest some more specific insults that I could record for later use in particular situations.

It seemed like this was a task that a thesaurus that monitored online usage would be preternaturally good at; after all, what does one do on the internet other than call people idiots?  Alas, this search didn’t go as well as flesh out, although the thesaurus still made a good effort.  Strangely, most of the responses were for imbecile as an adjective (which strikes me as comparatively rare) rather than a noun.  My main source of sadness was that it didn’t generate anywhere near the range of possibilities I’d expect in insults, offering mostly run-of-the-mill words like buffoon, dullard, or fool.  But it did offer two interesting ones with which I was unfamiliar. One was nidget, a now-forgotten word that lacked a single usage example.  The other was anile, which led me to uncover what I like to call the Great Anile Conspiracy — a strange and almost exciting phenomenon that I hope to detail in an upcoming post.  While the Wordnik thesaurus didn’t really give me a more specific insult, at least it tipped me off to two interesting words, so that’s something.

I realized, though, that expecting more specific insults from imbecile may have been an unfair query. I decided to try again with a more specific insult: blowhard.  The results were hit-and-miss.  The synonyms were spot-on: big mouth, blusterer, boaster, braggart, line-shooter, loudmouth, and — my personal favorite — vaunter.  The “words used in the same context” results weren’t, offering such words as Parker, valetudinarian, and book-review. How those occur in similar contexts to blowhard is opaque to me. However, I found rather hilarious and surprisingly accurate its choice of ex-governor as a contextual neighbor of blowhard — are there better examples of blowhards than Sarah Palin and Rod Blagojevich?

So all in all, the Wordnik thesaurus was worth checking out. It takes advantage of the capabilities of the Internet to offer both solid synonyms and noisy possibly related words. Its algorithms aren’t perfect, of course, but the mistakes are mostly pretty reasonable and/or enjoyable. It hasn’t replaced as my primary online thesaurus*, but it’s already interesting, and I’m looking forward to future developments that could make it supplant Roget’s in my heart.

*: I certainly hope that Wordnik hurries up and replaces as my thesaurus of choice, now that I’ve read the Wall Street Journal’s blog post noting that it (well, its parent site, has the highest number of trackers on its site of any of the top 50 most popular domains.

A lot of people make claims about what "good English" is. Much of what they say is flim-flam, and this blog aims to set the record straight. Its goal is to explain the motivations behind the real grammar of English and to debunk ill-founded claims about what is grammatical and what isn't. Somehow, this was enough to garner a favorable mention in the Wall Street Journal.

I'm Gabe Doyle, currently a postdoctoral scholar in the Language and Cognition Lab at Stanford University. Before that, I got a doctorate in linguistics from UC San Diego and a bachelor's in math from Princeton.

In my research, I look at how humans manage one of their greatest learning achievements: the acquisition of language. I build computational models of how people can learn language with cognitively-general processes and as few presuppositions as possible. Currently, I'm working on models for acquiring phonology and other constraint-based aspects of cognition.

I also examine how we can use large electronic resources, such as Twitter, to learn about how we speak to each other. Some of my recent work uses Twitter to map dialect regions in the United States.

