My dear girlfriend, realizing that I had been dangerously calm over the weekend, was kind enough to send me a CNN article that would boil my blood and burn my beans.  This article was basically a regurgitation of a press release, issued by Paul Payack of the Global Language Monitor, that performed some simple and simple-minded statistical analyses on the transcript of last Thursday’s Vice Presidential Debate.  The article/press release make two big claims:

  1. Biden spoke at an eighth-grade level while Palin spoke at almost a tenth-grade level.
  2. Palin used far more passive sentences than Biden did, betraying a desire to obscure the similarities between her and Bush/Cheney.

In truth, neither of these claims really make any sense, and I’m incredibly agitated that CNN would fall for this specious analysis.  In this post, I’ll discuss the problems with assessing “grade level” using readability tests, and I’ll follow it up shortly with one assaulting the passive issue.

Let me start off by addressing the idea that grade-level in speech, as measured by readability tests, is a meaningful measure of anything: IT’S NOT.  Payack’s analysis assigns grade levels based on a modified version of the Flesch-Kincaid readability test.  George Klare, all the way back in 1963, pointed out that most studies have shown that listener comprehension is not significantly affected by readability values from Flesch-Kincaid and similar tests.  Furthermore, Klare’s lack of effect was based on testing the comprehension of someone listening to a speaker reading pre-prepared text; presumably listener comprehension of an extemporaneous speaker would be even less well-correlated with readability of the transcript.

Why do I say that? Well, one thing about speech, unlike virtually all writing, is that speech does not contain punctuation.  Consider, for example, this response of Sarah Palin’s in the debate:

“I’ve been there. I know what the hurts are. I know what the challenges are. And, thank God, I know what the joys are, too, of living in America. We are so blessed. And I’ve always been proud to be an American. And so has John McCain.” [as punctated by CNN transcribers]

Here we have three sentences beginning with and — or at least that’s the way CNN transcribed it. From my memory of that segment of the debate, Palin did end each of those phrases with a protracted pause that would be best marked with a period in the transcript. But at the same time, I would be reluctant to claim that they were all independent sentences — especially the last one, which is clearly a continuation of the next-to-last sentence.  Now, according to the standard Flesch-Kincaid readability score contained in Google Docs, this segment comes in with a 3.0 grade level.  That doesn’t sound too unreasonable; these are tight little sentences, each containing only a single thought, so we’d expect them to be easy to comprehend — although I would be rather surprised if the average third-grader could summarize this quote effectively.

But what if we join the and sentences with commas, as would be more standard in formal writing? Suddenly, the grade level rises to 5.0, a jump of two grade levels.  Not a huge jump, it seems, but remember that the claimed difference in grade levels between Biden and Palin was slightly less than that.  And boy, it didn’t seem to me that switching out those periods for commas made the Palin quote any less comprehensible.  Similarly, let’s look at a quote attributed to a State Department spokesman:

“I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant.”

That is a hard sentence to understand, not because any of the words are difficult, but rather because the syntax is extremely complex.  This is reflected by its grade level, which according to Google Docs is 9.0.  But if we split the sentence in two by replacing the comma with a period, the grade level plummets to 3.0, because each of these two sentences is short, with short words.  And surely no one would say that this sentence was made at a third-grade level.

The problem is that grade level in the Flesch-Kincaid system is based on two numbers: words per sentence and syllables per word.  But both of these are being used as proxies for what really makes a sentence difficult to read: structural complexity and word familiarity.  But as we see here, creating a compound sentence can inflate the words-per-sentence ratio without increasing the sentence complexity.  Klare makes this point in his article:

“Formulas appear to give score accurate to, or even within, one grade-level.  Yet actually they are seldom this accurate.”

There’s another shortcoming tp blindly applying a readability test to extemporaneous speech, beyond the issue of having to decide for yourself what punctuation goes where.  Unlike writing, speech is full of errors.  Consider one of Gwen Ifill’s (the moderator) statements during the debate.

“The House of Representatives this week passed a bill, a big bailout bill — or didn’t pass it, I should say.”

Clearly this sentence, if written rather than spoken, would have turned out more like

“The House of Representatives didn’t pass a big bailout bill this week.”

The spoken statement has a grade level of 8, according to Google Docs, but that value is inflated by the speech errors. The intended statement, with fewer words, is a full grade level lower.  Again, readability reveals itself to be inappropriate for extemporaneous speech, because it’s unclear how we should account for speech errors.

So we’ve got a double whammy here. Even if readability scores were appropriate for assessing anything about extemporaneous speech, the reported distinction is almost certainly less than the margin of error for the readability test.  No matter how you cut it, the distinction is illusory.  Why, it’s almost as foolish as docking a candidate for using passive sentences… but that’ll have to wait for Part 2.

[Note: the grade level of various politicians’ speech is a hotter issue than I’d initially realized.  The National Review’s Media Blog had a post the other day assessing Harry Reid’s speech to be at a sixth grade level.]

Summary: Readability tests like Flesch-Kincaid are inherently imprecise, even for written text.  When you try applying them to speech, the resulting number is pretty much meaningless.  A precise estimate of the difficulty of a sentence requires psycholinguistic testing, not just pressing F7 in Word. Please don’t attempt to win an argument by citing the grade level of your opponent’s speech.