You are currently browsing the tag archive for the ‘google’ tag.

Google+, Google’s answer to Facebook, has been generating a ton of buzz in its brief invitation-only phase. That’s about all I know about it; I’ve intentionally been avoiding investigating further. It doesn’t have FarmVille, so what’s the point? But I’m on Twitter too much to avoid Google+ entirely. I’d been getting 140-character updates about its importance or awesomeness from a variety of sources, but what finally got me to look into it was an update from an unexpected quarter: Ben Zimmer, with a tweet about the morphology of +1.

The +1 button on Google and Google+ is basically a generalization of Facebook’s “Like” button, indicating “what you like, agree with, or recommend on the web.” The trouble is that users are going to want to use +1 in more general contexts, treating the word* +1 as a stand-alone noun, verb, and so on. This already happened with Facebook’s Like, and there it was a pretty seamless process, since the new meaning of like could piggy-back on the morphology of the existing word like, resulting in likes, liked, liking, etc.

+1 doesn’t have this same ability, at least in text. Plus-one exists as a word in English, referring to “A person who accompanies another to an event as that person’s nominated guest, but who has not been specifically invited” (OED) — e.g., your date for an event. This word has its morphology basically worked out (plus-ones is used in the OED’s first attestation, back in 1977, and here’s an example of “plus-oned the alloys”, whatever that means). The trouble, though, is that the word isn’t written plus-one; it’s written +1. The pronounced forms are all worked out, but the written form is unestablished.

Credit is due to Google for recognizing this and wanting to establish the conventions. In their +1 help, they explain their spelling conventions, in which the morphologically complex forms are formed with apostrophes — +1’s, +1’d, +1’ing — rather than the plain forms +1s, +1d, +1ing. In so doing, they raised the hackles of some grammarians, so let’s look at each of the forms individually to try to explain the choice.

+1’s. Apostrophe-s is a standard way to pluralize nouns with strange forms, such as letters, numerals, acronyms, or abbreviations. This introduces ambiguity with the possessive form, but it avoids other ambiguities (such as pluralized a looking like the word as) and often looks better (I think Ph.D.s looks weird). Thus we see mind your p’s and q’s, multiple Ph.D.’s, and Rolling 7’s and 11’s. +1 ends in a numeral, so it’s not unusual to write it as +1’s instead of +1s, although either is acceptable. (For more on apostrophes in plurals, see this old post.)

+1’d. Apostrophe-d for the past tense is not as common as apostrophe-s for the plural, but it’s certainly not unheard of. Fowler’s Modern English Usage favors it for words ending in a fully pronounced vowel — forming mustachio’d instead of mustachioed, for example — in order to avoid a strange collocation of vowels clogging the end of the word. However, this appears to be a minority position; mustachioed generates about 35 times more Google hits than mustachio’d.

"Wait, lads! Am I being shanghaied or shanghai'd?"

Apostrophe-d used to be a more general suffix, up until around the middle of the 19th century (judging by the Corpus of Historical American English). In Middle English, the -ed suffix was always pronounced with the vowel, and in Early Modern English, the vowel was optional in some words where today it is obligatorily omitted. If you’ve ever heard someone described as learned, pronounced /learn-ED/ instead of /learnd/, you’ve seen one of the few remaining vestiges of this alternation. With variation, it was useful to have different written forms to indicate whether the vowel was pronounced or not.

I first learned of this reading a Shakespeare play in which certain words were written as, for instance, blessèd, with an accent indicating that the second e was to be pronounced so that the meter of teh line was correct. To clarify cases where the vowel was not to be pronounced, poets and playwrights would sometimes vanish the e into an apostrophe. This edition of Hamlet, for instance, includes both drowned and drown’d on the same page when different characters are talking about the death of Ophelia:

Queen: Your sister’s drown’d, Laertes.
Clown: Argal, she drowned herself willingly.

But historical usage is dead, so perhaps the more relevant comparision is looking at other numerical verbs. The only one that’s coming to my mind is 86, meaning to eject or reject something. Looking around, I see both 86’d and 86ed used, with 86’d appearing to be a bit more common. The Wikipedia entry for 86 only has 86’d attested, and there’s also a book titled 86’d. At the very least, 86’d is an acceptable variant, and seemingly the more common as well. In that case, it’s not surprising that Google would choose +1’d over +1ed or +1d.

+1’ing. Lastly, we have the present participle. There isn’t a historical component to this usage like there was for the past tense. The apostrophe-ing form is attested for 86, appearing in the book Repeat Until Rich, but 86ing without the apostrophe looks to be a little bit more common on the web as a whole.** The trouble is that 86(‘)ing just isn’t well-attested in either form. Unlike the plural and past tense, there isn’t much of a precedent for apostrophe-ing, and in fact there doesn’t seem to be much of a precedent for the present participle of a numeral in general. I think that the choice to include the apostrophe in the present participle was made strictly for consistency’s sake; I doubt many people would prefer the paradigm +1’s, +1’d, +1ing to the more consistent one they chose.

The future. Of course, it doesn’t really matter what Google says, just as it doesn’t really matter what Strunk & White or Fowler or I or any other language commentator says. Language is what people do with it. Personally, I suspect that the apostrophes will disappear fairly quickly. Even in typing this, I kept on being annoyed that I had to send a finger out in search of an apostrophe. When you’re writing something often, you want to toss out unnecessary stuff — Facebook is a good example of this; when I first ended up on it back in 2004, you still had to type to get to it, but that unnecessary the was quickly lost. As people become more familiar and comfortable with +1 and its inflected forms, the need (and the desire) for the apostrophes will ebb, and I think we’ll see +1s dominate. In fact, even typing +1 is kind of a pain (I keep accidentally typing +!), so I wouldn’t be surprised to see plus-ones, or even pluses, eventually become the standard.

*: I’m going to call +1 a word in this post, though you may find it more of a phrase. The key point is that it has a specific meaning that is not a simple sum of its component morphemes (plus and one), and that makes it word-like for my purposes.

**: 86’ing doesn’t appear in the Google N-grams corpus, suggesting it appeared less than 40 times in a trillion words. 86ing appears there with 962 hits.

The explosion of data available to language researchers in the form of the Internet and massive corpora (e.g., the Corpus of Contemporary American English or the British National Corpus) is, I think, a necessary step toward a complete theory of what the users of a language know about their language and how they use that information. I became convinced of this with Joan Bresnan’s work on the dative alternation — which I’ve previously fawned over as the research that really drew me into linguistics — in which she and her colleagues show that people unconsciously combine multiple pieces of information during language production in order to make probabilistic decisions about the grammatical structures they use. This went against the original idea (which many grammaticasters still hold) that sentences are always either strictly grammatical or strictly ungrammatical. Furthermore, it showed the essential wrongness of arguing that one structure is ungrammatical on analogy to another structure. After all, if (1a) is grammatical, by analogy (1b) has to be as well, right?

(1a) Ann faxed Beth the news
(1b) ?Ann yelled Beth the news

That’s not the case, though.* There are a lot of different factors affecting grammaticality in the dative alternation, including the length difference between the objects, their animacy and number, and even the verb itself. But this conclusion was only reached by using a regression model over a large corpus of dative sentences. This regression identified both the significant features and their effects on the alternation proportions. In addition, having the corpus allowed the researchers to find grammatical sentences that broke previously assumed rules about the dative alternation, showing that the assumed rules were false. Prior to having a corpus study on this alternation, people thought they mostly understood it, but now that we have the corpus study, the results are much different from what we’d been saying.

And this illustrates the power and downright necessity of corpora to descriptivist linguistics (i.e., linguistics). Sure, it might seem obvious that if you really want to describe a language, you need to have massive amounts of data about the language to drive your conclusions. But for almost the whole history of linguistics, we didn’t have it, and had to make do from extracted snippets of the language and imagined sentences, and those are susceptible to all kinds of biases and illusions. Having the corpora available and accessible can save us from some of these biases.

But, of course, corpora can introduce biases of their own. Corpora are imperfect, and in general they still must be supplemented by value judgments and constructed examples. An example that I once had the pleasure of seeing Ivan Sag and Joan Bresnan discuss was that if we go by raw word counts, the common typo teh was as much a word in the 1800s as crinkled. Similarly, if we were to turn linguistics over to corpora entirely and only accept observed sentences as grammatical, then I swept a sphere under the fogged window would be ungrammatical, since it has no hits on Google (at least until this post is indexed). Corpora are treasure troves, but as a quick review of the Indiana Jones series will remind you, treasure troves are laden with pitfalls and spikes.

Yep, this is exactly the sort of danger I face in my day-to-day research.

I was reminded of this when I looked up the historical usage of common English first names to look for rises and falls in their popularity. I looked up Brian in the Google Books N-grams, and found a spike that represents what I like to call the era of Brian:

Hmm, something sent Brian usage through the roof in the late 1920s, only to come crashing back down like the stock market (might they have been linked?!). Time to investigate further in the Corpus of Historical American English (COHA):

Oh wait, never mind, the era of Brian wasn’t in the 1920s; it was in the 1860s (and presaged in the 1830s). Wait, what? Let me go back to Google N-grams:

Oh dear, it’s spreading! What is happening? What is the meaning of Brian?!

The fact is, as you surely already knew, that there was no era of Brian. The variability of the length of the era in the first and third graphs is due to me changing the smoothing factor on the graph. The source of the spike is that in one year the proportion of “Brian” in the corpus shot up to around 10 to 20 times its base level. (This becomes clear if you look at the unsmoothed numbers.) And if we look at the composition of the corpus at that point (1929), it turns out that the Google Books corpus contains a 262-page book titled “Brian, a story”, which seems like it would account for this surge. The COHA corpus has a similar thing going on; two books in 1832 and 1834 have prominent characters with the name Brian, and 1860 has a book titled “Brian O’Linn”.

And that’s one of the problems of corpora. Sure, they’re full of far more linguistic information than the little sampling we used to use, but they’re still incomplete and composed as a not statistically independent sample of the full range of language. If these corpora contained the whole of all writing published in these years, the Brian spike would be negligible, but because of the inherently incomplete nature of corpora, a single book can have an inordinate effect on the apparent proportions of different words.

Corpora are great, but they’re also noisy, and they do require interpretation. I didn’t get that at first, and thought that interpreting corpus data was invariably infecting it with one’s own prejudices. And yeah, that’s a danger, writing off real phenomena that you don’t believe in because you don’t believe in them. But the answer isn’t to accept the corpus data as absolute truth. You have to be as skeptical of your corpora as you are of your constructed examples. And that’s advice, I’m sure, that very few of you will ever need.

*: If you find (1b) to be perfectly grammatical, that’s fine. I think you’ll find other examples in the paper that you consider less than perfectly grammatical but have grammatical analogues. And even if you don’t, the data will hopefully assure you that other people do.

Post Categories

The Monthly Archives

About The Blog

A lot of people make claims about what "good English" is. Much of what they say is flim-flam, and this blog aims to set the record straight. Its goal is to explain the motivations behind the real grammar of English and to debunk ill-founded claims about what is grammatical and what isn't. Somehow, this was enough to garner a favorable mention in the Wall Street Journal.

About Me

I'm Gabe Doyle, currently an assistant professor at San Diego State University, in the Department of Linguistics and Asian/Middle Eastern Languages, and a member of the Digital Humanities. Prior to that, I was a postdoctoral scholar in the Language and Cognition Lab at Stanford University. And before that, I got a doctorate in linguistics from UC San Diego and a bachelor's in math from Princeton.

My research and teaching connects language, the mind, and society (in fact, I teach a 500-level class with that title!). I use probabilistic models to understand how people learn, represent, and comprehend language. These models have helped us understand the ways that parents tailor their speech to their child's needs, why sports fans say more or less informative things while watching a game, and why people who disagree politically fight over the meaning of "we".

@MGrammar on twitter

Recent Tweets

If you like email and you like grammar, feel free to subscribe to Motivated Grammar by email. Enter your address below.

Join 980 other followers

Top Rated

%d bloggers like this: