A news story’s making the rounds this week that the members of the U.S. Congress have stopped talking at an 11th-grade level and have started talking at a 10th-grade level. This fits very neatly into the overall feeling that America is becoming ever more anti-intellectual, that Congress has become a group of petty and immature cliques who exist primarily to prevent each other from accomplishing anything, which is why the story has picked up steam. And perhaps these feelings are accurate, but this story doesn’t provide any evidence of it.
In short, the Flesch-Kincaid readability test that’s used in this analysis is completely inappropriate for the task.
I discussed this during the Vice-Presidential debates back in 2008, and Chad Nilep at the Society for Linguistic Anthropology and Mark Liberman at Language Log each talked about it in light of this new story. Here’s an updated set of arguments why the whole thing is nonsense.
How do we deal with speech errors? Speech has something that writing doesn’t have: disfluencies. Whether it’s a filled pause (uh, um, you know), a correction (We have — I mean, don’t have), an aborted phrase (I am a man with– I have goals), there’re lots of words that come through in speech that wouldn’t be in edited writing. Here’s an example from the 2008 debate, where Gwen Ifill said:
“The House of Representatives this week passed a bill, a big bailout bill — or didn’t pass it, I should say.”
That’s a sentence supposedly at the eighth-grade level. If we remove the mistakes & repetitions, we get a sentence that has now dropped a grade level. That’s the same drop that Congress supposedly has undergone. Maybe they just started editing the Congressional Record more tightly?
Grade levels aren’t based on content or ideas. The Flesch-Kincaid grade level calculation uses two statistics: syllables per word and words per sentence. These are imprecise stand-ins for want we really want, which is presumably the difficulty of the individual words and the complexity of the sentence structure. Word difficulty is going to be tied to their predictability in context, their frequency in the language, their morphological complexity, and other factors, all of which are loosely correlated with the number of syllables. Longer words will in general be more difficult, but there is a lot of noise in the correlation. Because we’re only using an estimate of the difficulty, our estimate of the grade level is inherently imprecise.
There is no punctuation in speech. There are lots of different ways to punctuate a speech. Is a given pause supposed to indicate a comma, a semicolon, or a period? The difference between these can be substantial; Nilep’s post shows how punctuating the speech errors as sentences of their own drop a sentence from grade level 28(!) to 10.
The rhetorical style of a speaker also comes into play here. Suppose Senator X and Senator Y deliver the same speech. Senator X uses a staccato style, where each clause becomes its own sentence. Senator Y uses a more relaxed and naturalistic style, combining some clauses with semicolon-ish pauses. Because the reading level calculation is based largely on number of words per sentence, Senator Y is going to get a much higher grade level, even though the only difference is in the delivery, not any of the content.
What does the grade level measure? The idea of grade-level estimation for writing was to give a quick estimate of how difficult a passage is to understand. The main readability scores were calibrated by asking people with known reading proficiency (as determined by a comprehension test or the grade level they were in) to read passages of various difficulty and to answer comprehension questions. The goal of the calibration was to get it so that if a piece of writing had a grade level of X, then people who read at the X level would be able to get some given percent of the comprehension questions right. Crucially, the grade level does not measure the content of the text, or the intelligence of the ideas it contains. In fact, for readability — the purpose the tests were developed for — a lower score is always better, assuming the same information is conveyed.
As I mentioned above, there’s a world of difference between reading and writing, so this calibration is probably invalid for speech. But if was valid, then we’d probably want to see the level go down.
The designers knew grade levels were imprecise measures. In a 1963 paper, George Klare wrote:
“Formulas appear to give score accurate to, or even within, one grade-level. Yet actually they are seldom this accurate.”
In a 2000 paper, George Klare wrote:
“Typical readability formulas are statistical regression equations, not mathematical identities, and do not reach that level of precision.”
I mention the two quotes here because they span 40 years of readability research, and the point remains the same. Grade-level assessment is somewhat informative, but it’s not very precise. You can be reasonably certain that a child will understand a third-grade level story better than a twelfth-grade level one. It is not nearly so certain that a tenth-grade level and eleventh-grade level story will be distinguishable. In fact, the Kincaid et al paper from 1975 that debuted the Flesch-Kincaid reading level calculation acknowledges its imprecision:
“Actually, readability formulas are only accurate to within one grade level, so an error of .1 grade level is trivial.”
Conclusions. So what we have here is a difference of 1 grade level (which is the edge of meaningfulness in ideal circumstances) when the reading level calculation is applied to speech, on which it is uncalibrated and in which we don’t have clear plans in place to account for the vagaries of punctuation and the issue of speech errors. Also, we have no data on the cause of the grade level decrease, whether it’s due to dumbing down, a push for clarity, or just new punctuation guidelines at the Congressional Record.
Which is to say, we have no reason to believe in this effect, nor to draw conclusions about its source, other than the unfortunate fact that we have a belief crying out to be validated.