You are currently browsing the category archive for the ‘grade level’ category.

A news story’s making the rounds this week that the members of the U.S. Congress have stopped talking at an 11th-grade level and have started talking at a 10th-grade level. This fits very neatly into the overall feeling that America is becoming ever more anti-intellectual, that Congress has become a group of petty and immature cliques who exist primarily to prevent each other from accomplishing anything, which is why the story has picked up steam. And perhaps these feelings are accurate, but this story doesn’t provide any evidence of it.

In short, the Flesch-Kincaid readability test that’s used in this analysis is completely inappropriate for the task.

I discussed this during the Vice-Presidential debates back in 2008, and Chad Nilep at the Society for Linguistic Anthropology and Mark Liberman at Language Log each talked about it in light of this new story. Here’s an updated set of arguments why the whole thing is nonsense.

How do we deal with speech errors? Speech has something that writing doesn’t have: disfluencies. Whether it’s a filled pause (uh, um, you know), a correction (We have — I mean, don’t have), an aborted phrase (I am a man with– I have goals), there’re lots of words that come through in speech that wouldn’t be in edited writing. Here’s an example from the 2008 debate, where Gwen Ifill said:

“The House of Representatives this week passed a bill, a big bailout bill — or didn’t pass it, I should say.”

That’s a sentence supposedly at the eighth-grade level. If we remove the mistakes & repetitions, we get a sentence that has now dropped a grade level. That’s the same drop that Congress supposedly has undergone. Maybe they just started editing the Congressional Record more tightly?

Grade levels aren’t based on content or ideas. The Flesch-Kincaid grade level calculation uses two statistics: syllables per word and words per sentence. These are imprecise stand-ins for want we really want, which is presumably the difficulty of the individual words and the complexity of the sentence structure. Word difficulty is going to be tied to their predictability in context, their frequency in the language, their morphological complexity, and other factors, all of which are loosely correlated with the number of syllables. Longer words will in general be more difficult, but there is a lot of noise in the correlation. Because we’re only using an estimate of the difficulty, our estimate of the grade level is inherently imprecise.

There is no punctuation in speech. There are lots of different ways to punctuate a speech. Is a given pause supposed to indicate a comma, a semicolon, or a period? The difference between these can be substantial; Nilep’s post shows how punctuating the speech errors as sentences of their own drop a sentence from grade level 28(!) to 10.

The rhetorical style of a speaker also comes into play here. Suppose Senator X and Senator Y deliver the same speech. Senator X uses a staccato style, where each clause becomes its own sentence. Senator Y uses a more relaxed and naturalistic style, combining some clauses with semicolon-ish pauses. Because the reading level calculation is based largely on number of words per sentence, Senator Y is going to get a much higher grade level, even though the only difference is in the delivery, not any of the content.

What does the grade level measure? The idea of grade-level estimation for writing was to give a quick estimate of how difficult a passage is to understand. The main readability scores were calibrated by asking people with known reading proficiency (as determined by a comprehension test or the grade level they were in) to read passages of various difficulty and to answer comprehension questions. The goal of the calibration was to get it so that if a piece of writing had a grade level of X, then people who read at the X level would be able to get some given percent of the comprehension questions right. Crucially, the grade level does not measure the content of the text, or the intelligence of the ideas it contains. In fact, for readability — the purpose the tests were developed for — a lower score is always better, assuming the same information is conveyed.

As I mentioned above, there’s a world of difference between reading and writing, so this calibration is probably invalid for speech. But if was valid, then we’d probably want to see the level go down.

The designers knew grade levels were imprecise measures. In a 1963 paper, George Klare wrote:

“Formulas appear to give score accurate to, or even within, one grade-level. Yet actually they are seldom this accurate.”

In a 2000 paper, George Klare wrote:

“Typical readability formulas are statistical regression equations, not mathematical identities, and do not reach that level of precision.”

I mention the two quotes here because they span 40 years of readability research, and the point remains the same. Grade-level assessment is somewhat informative, but it’s not very precise. You can be reasonably certain that a child will understand a third-grade level story better than a twelfth-grade level one. It is not nearly so certain that a tenth-grade level and eleventh-grade level story will be distinguishable. In fact, the Kincaid et al paper from 1975 that debuted the Flesch-Kincaid reading level calculation acknowledges its imprecision:

“Actually, readability formulas are only accurate to within one grade level, so an error of .1 grade level is trivial.”

Conclusions. So what we have here is a difference of 1 grade level (which is the edge of meaningfulness in ideal circumstances) when the reading level calculation is applied to speech, on which it is uncalibrated and in which we don’t have clear plans in place to account for the vagaries of punctuation and the issue of speech errors. Also, we have no data on the cause of the grade level decrease, whether it’s due to dumbing down, a push for clarity, or just new punctuation guidelines at the Congressional Record.

Which is to say, we have no reason to believe in this effect, nor to draw conclusions about its source, other than the unfortunate fact that we have a belief crying out to be validated.


It looks like CNN credulously spit out another story from Global Language Monitor (GLM). Basically, GLM did their usual thing of running a speech (in this case, Obama’s oil spill speech from mid-June) through some mindless statistics, getting out the Flesch-Kincaid Grade Level, and then reporting it as though it was actually meaningful analysis. Language Log and Johnson already explained why the GLM analysis is nonsense, and as a result, CNN actually substantially re-wrote the story.

I discussed the meaninglessness of grade level analysis a year and a half ago in more depth, but this time let me just offer an illustration of why grade-level analysis is not at all appropriate for political analysis.  Here’s a bit from the early part of Obama’s address.  It has a Flesch-Kincaid Reading Level of 10.2, a level that GLM said reflected Obama’s “elite ethos”

“Already, this oil spill is the worst environmental disaster America has ever faced. And unlike an earthquake or a hurricane, it’s not a single event that does its damage in a matter of minutes or days. The millions of gallons of oil that have spilled into the Gulf of Mexico are more like an epidemic, one that we will be fighting for months and even years.”

Okay, but let me show you another passage that I’ve chosen to exactly match the above passage in Flesch-Kincaid Grade Level. It ought to be equally reflective of an elite ethos:

“one Gulf Already, America unlike millions even has oil does of spill ever be its minutes of not for disaster the And the single matter event earthquake we this epidemic, are a damage spilled The worst into environmental months it’s that or of a that of faced. will oil an is or like a hurricane, fighting Mexico more days. in an gallons that have and years.”

If the extent of your analysis is to look at grade levels, you’re going to say that these two passages are equivalent. That’s because the Flesch-Kincaid Grade Level formula is merely a weighted linear combination of number of words per sentence and number of letters per word. Since these two paragraphs contain the same words, letters, spaces, and periods, the statistics are the same for each, and therefore any conclusion drawn about the first paragraph solely from these statistics necessarily must be drawn about the second paragraph as well.

That’s the problem. These statistics and readability tests don’t look into word frequency, semantics, pragmatics, fluidity, rhetoric, style, or anything that actual humans do to assess the readability and meaningfulness of a text. The tests, after all, are intended as an approximation for when an informed analyst is not available, not as a data source in lieu of informed analysis.

To be fair, GLM’s analysis doesn’t stop at grade levels. They also offer the proportion of passive sentences in the address, which they report as “the highest level measured in any major presidential address this century”. And that’s something, except for Mark Liberman’s discovery that it’s not nearly true. Bush’s similar post-Katrina address had 17% passives; Obama’s post-oil-spill address lagged behind with a mere 11%. (GLM’s president, by the way, considers “There will be setbacks” to be a passive sentence, so it’s not terribly surprising that their passive statistics aren’t great.) But even if the count were right, the passive proportion is not an inherently meaningful statistic either, because passives are employed by good writers for reasons other than evasion, which seems to be the only use GLM can come up with for them.

I hesitate to say that there is no useful information to be found by calculating simple statistics on major presidential addresses. But readability scores are dependent on the choice of punctuation for a speech, overlook rhetorical devices and structure, ignore frequency and semantics, and haven’t been shown to correlate very well with listener comprehension. It is unlikely that useful information will come such simplistic analyses. And though it is not impossible that one day someone will find it, I have not yet seen a single informative result from grade level or other simple statistical analysis on political speech.

My dear girlfriend, realizing that I had been dangerously calm over the weekend, was kind enough to send me a CNN article that would boil my blood and burn my beans.  This article was basically a regurgitation of a press release, issued by Paul Payack of the Global Language Monitor, that performed some simple and simple-minded statistical analyses on the transcript of last Thursday’s Vice Presidential Debate.  The article/press release make two big claims:

  1. Biden spoke at an eighth-grade level while Palin spoke at almost a tenth-grade level.
  2. Palin used far more passive sentences than Biden did, betraying a desire to obscure the similarities between her and Bush/Cheney.

In truth, neither of these claims really make any sense, and I’m incredibly agitated that CNN would fall for this specious analysis.  In this post, I’ll discuss the problems with assessing “grade level” using readability tests, and I’ll follow it up shortly with one assaulting the passive issue.

Let me start off by addressing the idea that grade-level in speech, as measured by readability tests, is a meaningful measure of anything: IT’S NOT.  Payack’s analysis assigns grade levels based on a modified version of the Flesch-Kincaid readability test.  George Klare, all the way back in 1963, pointed out that most studies have shown that listener comprehension is not significantly affected by readability values from Flesch-Kincaid and similar tests.  Furthermore, Klare’s lack of effect was based on testing the comprehension of someone listening to a speaker reading pre-prepared text; presumably listener comprehension of an extemporaneous speaker would be even less well-correlated with readability of the transcript.

Why do I say that? Well, one thing about speech, unlike virtually all writing, is that speech does not contain punctuation.  Consider, for example, this response of Sarah Palin’s in the debate:

“I’ve been there. I know what the hurts are. I know what the challenges are. And, thank God, I know what the joys are, too, of living in America. We are so blessed. And I’ve always been proud to be an American. And so has John McCain.” [as punctated by CNN transcribers]

Here we have three sentences beginning with and — or at least that’s the way CNN transcribed it. From my memory of that segment of the debate, Palin did end each of those phrases with a protracted pause that would be best marked with a period in the transcript. But at the same time, I would be reluctant to claim that they were all independent sentences — especially the last one, which is clearly a continuation of the next-to-last sentence.  Now, according to the standard Flesch-Kincaid readability score contained in Google Docs, this segment comes in with a 3.0 grade level.  That doesn’t sound too unreasonable; these are tight little sentences, each containing only a single thought, so we’d expect them to be easy to comprehend — although I would be rather surprised if the average third-grader could summarize this quote effectively.

But what if we join the and sentences with commas, as would be more standard in formal writing? Suddenly, the grade level rises to 5.0, a jump of two grade levels.  Not a huge jump, it seems, but remember that the claimed difference in grade levels between Biden and Palin was slightly less than that.  And boy, it didn’t seem to me that switching out those periods for commas made the Palin quote any less comprehensible.  Similarly, let’s look at a quote attributed to a State Department spokesman:

“I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant.”

That is a hard sentence to understand, not because any of the words are difficult, but rather because the syntax is extremely complex.  This is reflected by its grade level, which according to Google Docs is 9.0.  But if we split the sentence in two by replacing the comma with a period, the grade level plummets to 3.0, because each of these two sentences is short, with short words.  And surely no one would say that this sentence was made at a third-grade level.

The problem is that grade level in the Flesch-Kincaid system is based on two numbers: words per sentence and syllables per word.  But both of these are being used as proxies for what really makes a sentence difficult to read: structural complexity and word familiarity.  But as we see here, creating a compound sentence can inflate the words-per-sentence ratio without increasing the sentence complexity.  Klare makes this point in his article:

“Formulas appear to give score accurate to, or even within, one grade-level.  Yet actually they are seldom this accurate.”

There’s another shortcoming tp blindly applying a readability test to extemporaneous speech, beyond the issue of having to decide for yourself what punctuation goes where.  Unlike writing, speech is full of errors.  Consider one of Gwen Ifill’s (the moderator) statements during the debate.

“The House of Representatives this week passed a bill, a big bailout bill — or didn’t pass it, I should say.”

Clearly this sentence, if written rather than spoken, would have turned out more like

“The House of Representatives didn’t pass a big bailout bill this week.”

The spoken statement has a grade level of 8, according to Google Docs, but that value is inflated by the speech errors. The intended statement, with fewer words, is a full grade level lower.  Again, readability reveals itself to be inappropriate for extemporaneous speech, because it’s unclear how we should account for speech errors.

So we’ve got a double whammy here. Even if readability scores were appropriate for assessing anything about extemporaneous speech, the reported distinction is almost certainly less than the margin of error for the readability test.  No matter how you cut it, the distinction is illusory.  Why, it’s almost as foolish as docking a candidate for using passive sentences… but that’ll have to wait for Part 2.

[Note: the grade level of various politicians’ speech is a hotter issue than I’d initially realized.  The National Review’s Media Blog had a post the other day assessing Harry Reid’s speech to be at a sixth grade level.]

Summary: Readability tests like Flesch-Kincaid are inherently imprecise, even for written text.  When you try applying them to speech, the resulting number is pretty much meaningless.  A precise estimate of the difficulty of a sentence requires psycholinguistic testing, not just pressing F7 in Word. Please don’t attempt to win an argument by citing the grade level of your opponent’s speech.

Post Categories

The Monthly Archives

About The Blog

A lot of people make claims about what "good English" is. Much of what they say is flim-flam, and this blog aims to set the record straight. Its goal is to explain the motivations behind the real grammar of English and to debunk ill-founded claims about what is grammatical and what isn't. Somehow, this was enough to garner a favorable mention in the Wall Street Journal.

About Me

I'm Gabe Doyle, currently an assistant professor at San Diego State University, in the Department of Linguistics and Asian/Middle Eastern Languages, and a member of the Digital Humanities. Prior to that, I was a postdoctoral scholar in the Language and Cognition Lab at Stanford University. And before that, I got a doctorate in linguistics from UC San Diego and a bachelor's in math from Princeton.

My research and teaching connects language, the mind, and society (in fact, I teach a 500-level class with that title!). I use probabilistic models to understand how people learn, represent, and comprehend language. These models have helped us understand the ways that parents tailor their speech to their child's needs, why sports fans say more or less informative things while watching a game, and why people who disagree politically fight over the meaning of "we".

@MGrammar on twitter

Recent Tweets

If you like email and you like grammar, feel free to subscribe to Motivated Grammar by email. Enter your address below.

Join 981 other subscribers

Top Rated

%d bloggers like this: