“Using data as a singular is wrong,” writes Barbara Walraff in Word Court. I agreed with this in my middle-school days, when I strove to show that I was destined for great things and illustrated it by being painfully, officiously, incessantly prim. I went to science fairs, and on the little note cards that carried my polished speech about bacteria, or supernovae, or whatever it was I was claiming to have made an important discovery about, I usually wrote “The data show that …” just before revealing the results that I expected would blow the judges away. Surprisingly, though, they weren’t shocked by results such as “oil-eating bacteria do indeed eat oil” nor amazed by my revelations that “salt-water shrimp die in fresh water”. But maybe they were just too distracted by the phrase “the data show” to actually listen to what it was that the data were showing.
I stopped generally treating data as a plural a few years ago, because no matter how many times I used it, it always sounded like I was putting on airs. I know, I know — data entered the language as the plural of the Latin borrowing datum, and therefore data forever should be a plural in English. But it’s really not so simple as that. I’m not about to argue that data are is wrong. But I am going to argue that there are some reasonable reasons to accept data is.
Exhibit A: the acceptability of — in fact, the preference for — data is in certain circumstances. There are two major senses for the word data. The original sense is a collection of numbers, facts, results, etc. from experiments and observations, as in (1). The other sense is a collection of information stored on a computer, usually in binary form, as in (2).
(1) Lack of data is killing (macro)economics
(2) Method to increase the amount of customer data on a hard disk drive
The second sense of data is a mass noun; it sounds quite odd to say “I have a data/datum on this hard drive”. It’s like mail, milk, money, and some non-m words as well. Mass nouns receive singular agreement:
(3a) Your mail is/*are sitting on the table.
(3b) The data on these hard drives is/*are corrupt.
So for this computerized sense, data is is not only acceptable, but strongly preferred. (There are a few instances of plural agreement with computer data, but these are quite rare.) Now here’s the problem: nowadays it’s awful hard to separate the two senses of data. I, for instance, build computer models of human language usage. So my data is a collection of facts in the world that is represented as a collection of binary digits on a computer disk; I could be using either sense of data to describe it. So what’s the problem with choosing to treat it as a mass noun, if that’s one possible form for it?
Exhibit B: other Latinate words have shed their plural history for the singular. Most prominent amongst these is agenda. Yes, agenda, meaning the set of points to be discussed in a meeting, the set of things to do in the future, or the book in which a calendar is kept. Agenda is treated as a singular noun, with agendas as its plural. Agendum, the “proper” singular form, has pretty well disappeared from English. Surprisingly, this transition has NOT led to linguistic anarchy, nor any other notable harm to the language or its speakers. It seems to be safe to allow data to follow the same path.
Exhibit C: there is not always agreement between semantics and syntax.
(4a) Where are my pants?
(4b) My scissors have rusted.
(4c) I own many pairs of plaid shorts.
What do these sentences have in common? Each refers to a semantically singular object with plural syntactic agreement. Note that each of these objects is composed of two parts, but each undeniably functions as a whole. A pair of pants is not like a pair of shoes in that regard, because you could have a single shoe, but not a single pant — that’s a pant leg. (I’m ignoring Express’s Editor Pant here.) So if we English speakers are willing to tolerate a single object taking plural agreement, why can’t we tolerate the “plural” data taking singular agreement?
Exhibit D: Lastly, when I’m speaking of data as a linguist, I’m not just talking about a set of facts, but rather a collection of facts, observations, arguments, and analyses. It is rare, in this day and age, that the points of data in an experiment, taken alone, can justify a claim. (I’m pretty sure this is the case in most fields.) The fact that people take longer to read certain words in a certain task does not, in and of itself, establish that these words are harder than others. Rather, this fact, combined with a set of assumptions and analyses that we all agree to accept, establishes the claim. We’re viewing the datums as a sort of team, all working together. In that sense, the data is an inter-related mass, rather than a series of separable points of data. Thus, they ought to be reasonably thought of as a mass or collective (like family or team) noun, either of which would take singular agreement in Standard American English.
That’s four reasons why I think data is should be all right. Insisting that people should say data are, in spite of the fact that an American English speaker can’t use data are without sounding pretentious or outmoded, is stupid. You’re welcome to keep using it, but stop making other people use it too. I don’t see any harm coming to the language based on how you use data. I don’t see any improvement to the language either. Go with what feels right. I’m guessing that’s data is.
Summary: A lot of people insist that data is is unacceptable. But there’re at least four reasons why data is should be fine. So if you think, like I do, that data is works better than data are, well, go ahead and use it! The same holds if you think data are is better. But it’s stupid to argue that only one or the other is correct.
***
The Stupid Grammar Rules series as it stands:
- I: Email vs. e-mail (04/11/08)
- II: data are (08/11/08)
13 comments
Comments feed for this article
August 11, 2008 at 4:00 pm
Jonathon
Don’t forget stamina.
I think I might add exhibit E: the scarcity of the word datum. It turns up a lot of Google hits, but most of them seem to be dictionary definitions, particularly for what appears to be a specialized sense in the field of geology (with the plural datums). It’s harder to argue that one form is the plural of another if the alleged singular form has a marginal existence at best. (Though there are exceptions, of course, as exhibit C shows.)
August 11, 2008 at 4:49 pm
Gabe
Wow, Language Hat really scored there! It never occurred to me that “stamina” might be plural. And the scarcity of “datum” is a real selling point for “data is”, too.
August 13, 2008 at 11:43 am
goofy
And agenda, bacteria, bus, candelabra, erotica, graffiti, paraphernalia, trivia.
August 13, 2008 at 7:17 pm
mike
I’ve found the “data are” test to be a pretty good indicator as to whether it’s worthwhile to discuss grammar with a person. If they (<– !!) insist on “data are” because “that’s how it’s done in Latin,” it seems safe to assume that it isn’t really a discussion, just some peevological rantings, ie, how things “should be” as opposed to how they actually are.
Also: opera.
August 24, 2008 at 8:10 am
Bill Brohaugh
It’s like the beginning of the old counting rhyme: One po-datum, two po-data, three po-data, four. Or maybe not.
August 26, 2008 at 4:17 pm
Gabe
mike: that is a litmus test I can get behind.
August 27, 2008 at 3:59 am
Everything You Know About English Is Wrong » Stamina R Us
[…] I thought until I stumbled across a fascinating post in the Languagehat blog archives (which the “Stupid Grammar Rules II: Data Are” post at the Motivated Grammar blog pointed me to). Languagehat explains that stamina is plural, and then concludes: Heretofore, when […]
October 28, 2010 at 8:32 am
RichieP
There is a reason that “data are” sounds strange. That is because people know intuitively that “data” is being used as a mass noun, even if they don’t know what a mass noun is. You’ll notice that data is almost always quantified by units of measure — not by a number of “datums.” (Correct: This drive contains 160 gigabytes of data. Correct: We have collected pages and pages of data. Correct: The satellite has returned 437 days worth of data. Awkward: The satellite has returned 437 data, and tomorrow we expect to receive an additional datum.)
To those who insist that the English word “data” is always a count noun just because the Latin “inspiration” for it was a count noun: I challenge you to go around saying, “A gigabyte is FEWER data than a terabyte.”
February 9, 2012 at 12:35 pm
Jill M
“(1) Lack of data is killing (macro)economics.”
In this example, it is the “lack” that “is” killing macroeconomics, not the data. So, it’s not an example of a time when saying “data are” is awkward. Just sayin’. :)
April 3, 2012 at 7:40 pm
Grammar Rules are Stupid, but Still… « Illusions of Grammar
[…] thing to concoct an in-depth grammatical argument as to why you could say something either way. There are people out there like this. They live sad and lonely lives. For a good reason. Share this:TwitterFacebookLike this:LikeBe the […]
August 20, 2012 at 5:24 am
Pros master the etiquette of agreement « Pros Write
[…] Neither usage is wrong. I peronally prefer to treat “data” as singular. You can view a thoughtful supporting argument at Motivated […]
August 28, 2012 at 12:07 pm
bonnie
As an ecology grad student, use data as a plural is drilled into us. When I intend it as a mass noun, I just use “dataset” (or data sets) instead. Thus: “The data show…” but “the dataset shows…” It works well: no editors have nailed me on improper use of the word data.
November 29, 2014 at 7:07 am
W Stein
This post from http://english.stackexchange.com/questions/126344/data-as-a-plural-noun
clearly shows why insisting that “data” is plural “because of latin grammar” is an entirely false argument:
“In the Latin nominative case, data could be either the neuter plural or the feminine singular of datum. The neuter singular was datum, the masculine singular datus, the feminine plural datae, and the masculine plural dati.” -TortoiseWrath.com