You are currently browsing the tag archive for the ‘global language monitor’ tag.

It looks like CNN credulously spit out another story from Global Language Monitor (GLM). Basically, GLM did their usual thing of running a speech (in this case, Obama’s oil spill speech from mid-June) through some mindless statistics, getting out the Flesch-Kincaid Grade Level, and then reporting it as though it was actually meaningful analysis. Language Log and Johnson already explained why the GLM analysis is nonsense, and as a result, CNN actually substantially re-wrote the story.

I discussed the meaninglessness of grade level analysis a year and a half ago in more depth, but this time let me just offer an illustration of why grade-level analysis is not at all appropriate for political analysis.  Here’s a bit from the early part of Obama’s address.  It has a Flesch-Kincaid Reading Level of 10.2, a level that GLM said reflected Obama’s “elite ethos”

“Already, this oil spill is the worst environmental disaster America has ever faced. And unlike an earthquake or a hurricane, it’s not a single event that does its damage in a matter of minutes or days. The millions of gallons of oil that have spilled into the Gulf of Mexico are more like an epidemic, one that we will be fighting for months and even years.”

Okay, but let me show you another passage that I’ve chosen to exactly match the above passage in Flesch-Kincaid Grade Level. It ought to be equally reflective of an elite ethos:

“one Gulf Already, America unlike millions even has oil does of spill ever be its minutes of not for disaster the And the single matter event earthquake we this epidemic, are a damage spilled The worst into environmental months it’s that or of a that of faced. will oil an is or like a hurricane, fighting Mexico more days. in an gallons that have and years.”

If the extent of your analysis is to look at grade levels, you’re going to say that these two passages are equivalent. That’s because the Flesch-Kincaid Grade Level formula is merely a weighted linear combination of number of words per sentence and number of letters per word. Since these two paragraphs contain the same words, letters, spaces, and periods, the statistics are the same for each, and therefore any conclusion drawn about the first paragraph solely from these statistics necessarily must be drawn about the second paragraph as well.

That’s the problem. These statistics and readability tests don’t look into word frequency, semantics, pragmatics, fluidity, rhetoric, style, or anything that actual humans do to assess the readability and meaningfulness of a text. The tests, after all, are intended as an approximation for when an informed analyst is not available, not as a data source in lieu of informed analysis.

To be fair, GLM’s analysis doesn’t stop at grade levels. They also offer the proportion of passive sentences in the address, which they report as “the highest level measured in any major presidential address this century”. And that’s something, except for Mark Liberman’s discovery that it’s not nearly true. Bush’s similar post-Katrina address had 17% passives; Obama’s post-oil-spill address lagged behind with a mere 11%. (GLM’s president, by the way, considers “There will be setbacks” to be a passive sentence, so it’s not terribly surprising that their passive statistics aren’t great.) But even if the count were right, the passive proportion is not an inherently meaningful statistic either, because passives are employed by good writers for reasons other than evasion, which seems to be the only use GLM can come up with for them.

I hesitate to say that there is no useful information to be found by calculating simple statistics on major presidential addresses. But readability scores are dependent on the choice of punctuation for a speech, overlook rhetorical devices and structure, ignore frequency and semantics, and haven’t been shown to correlate very well with listener comprehension. It is unlikely that useful information will come such simplistic analyses. And though it is not impossible that one day someone will find it, I have not yet seen a single informative result from grade level or other simple statistical analysis on political speech.

Ben Zimmer has once again written a cutting post about Global Language Monitor, its absurd claim that the English language is about to get its millionth word, and the news sources that blindly regurgitate GLM’s warmed-over press releases about that.   I know it’s become cliche, upon reading an article that one disagrees with, to ask “So this is what passes for journalism these days?”   But articles like the BBC’s really demand that question. Here’s another, from the Telegraph, touting an obviously false claim: “One millionth English word could be ‘defriend’ or ‘noob’.”

First off, to the reporter’s credit, he manages to answer one question about GLM’s methodology; a word is a word by their count once it has been attested 25,000 times “by media outlets, on social networking websites and in other sources.” This information is not available on GLM’s website — I searched for 25,000, 25000, “twenty-five thousand”, “twenty five thousand”, “twentyfive thousand”, and “25 thousand” on the GLM website and didn’t get a single hit.  So kudos to the reporter for getting this nugget out!

But then the whole enterprise falls apart. The article notes that among the words GLM is “currently monitoring which could take English to the one million threshold” is noob. If that’s the case, then GLM’s monitors are incompetent.  I popped over to MySpace, which surely would be included in any reasonable list of social networking sites, and lo! 145,000 hits. It’s already a word by GLM’s arbitrary standard!  Who is GLM using to monitor the social sites? Clearly they ought to be fired. If noob, which has been in wide use by computer folks since the turn of the millennium, managed to slip under their nose, think of how many other unnoticed words there are! For all we know, English might have already passed this made-up milestone a month ago!  To call this possibility a tragedy is an unacceptable understatement.  And the claim that noob hadn’t been yet used 25,000 times on the Internet — where it was born all those years ago! — didn’t set off any alarms at the Telegraph?

How credulous can one be? Here’s the lead paragraph of the Telegraph article:

“The milestone will be passed at 10.22am on June 10 according to the Global Language Monitor, an association of academics that tracks the use of new words.”

And the last paragraph:

“The organisation first predicted that the millionth English word was imminent in 2006, and has repeatedly pushed back the expected date. Other linguist[s] have expressed scepticism about its methods, claiming that there is no agreement about how to classify a word.”

Of course if the first guess was only off by three years, it’s totally reasonable to assume the current guess is off by less than a minute.

Also, “other linguists” implies that Paul Payack is a linguist. He is not. I’m not even convinced he or his merry monitors can be called academics. They are entrepreneurs at best, and they are peddling nothing worth acknowledging.

Post Categories

The Monthly Archives

About The Blog

A lot of people make claims about what "good English" is. Much of what they say is flim-flam, and this blog aims to set the record straight. Its goal is to explain the motivations behind the real grammar of English and to debunk ill-founded claims about what is grammatical and what isn't. Somehow, this was enough to garner a favorable mention in the Wall Street Journal.

About Me

I'm Gabe Doyle, currently a postdoctoral scholar in the Language and Cognition Lab at Stanford University. Before that, I got a doctorate in linguistics from UC San Diego and a bachelor's in math from Princeton.

In my research, I look at how humans manage one of their greatest learning achievements: the acquisition of language. I build computational models of how people can learn language with cognitively-general processes and as few presuppositions as possible. Currently, I'm working on models for acquiring phonology and other constraint-based aspects of cognition.

I also examine how we can use large electronic resources, such as Twitter, to learn about how we speak to each other. Some of my recent work uses Twitter to map dialect regions in the United States.

@MGrammar on twitter

Recent Tweets

If you like email and you like grammar, feel free to subscribe to Motivated Grammar by email. Enter your address below.

Join 975 other followers

Top Rated

%d bloggers like this: