You are currently browsing the tag archive for the ‘math’ tag.

One of my labmates was practicing his talk for an upcoming conference, looking at the predictability of different continuations of a sentence. Showing a logarithmic graph of word frequencies, he remarked that at one end of the scale, the words were one-in-a-million continuations. None of us were surprised. That’s one of the neat things about being a computational psycholinguist; we deal with one-in-a-million occurrences so often that they’re mundane.

But as I thought about it, I realized that one-in-a-million events shouldn’t be so surprising to any language user. Consider the following sentences:

(1a) The eulogy started sweet and was joined by a chorus of sniffles […]
(1b) The dignified woman, who came to the shelter in February, taught a younger woman to tell time […]
(1c) The entrenched board is made up of individuals […]
(1d) The kitties hid inside the space under the dishwasher […]

Each of the above sentences starts out with a one-in-a-million event; if you look at a million sentences starting with The, on average there will be one sentence with eulogy as the second word, one with dignified as the second word, and so on. Those might not feel like one-in-a-million words, so let me go a little into how we get the odds of a certain word. (You can skip the next section if you already know about corpus probabilities.)

The odds of a word. I used the Google N-gram corpus, a collection of a trillion words from the Internet. It’s called the n-gram corpus because it gives counts of n-grams, which are just phrases of n words. A 1-gram (or unigram) is just an individual word, a 2-gram (or bigram) is something like the house or standing after, and so on for larger values of n. N-grams are really useful in natural language processing because they are an easy-to-use stand-in for more complicated linguistic knowledge.

The question we’re looking at here is how predictable a word is given all the context you have. Context can cover a wide range of information, including what’s already been said, the environment the words are said in, and personal knowledge about the speaker and the world. For instance, if you hear someone carrying an umbrella say “The weather forecast calls for”, you’re probably going to predict the next word is “rain”. If the speaker were carrying a shovel instead, you might guess “snow”.

If you want a quick estimate of the predictability of a word, you can use n-grams to give a sort of general probability for the next word. So, in (1a), the predictability of eulogy is estimated as the probability of seeing eulogy following The at the start of the sentence based on the counts in the corpus. Here’s how we get the one-in-a-million estimate:

Let me break this equation down. The left-hand side, p(eulogy|The) is the estimated probability of seeing eulogy given that we started the sentence with The. This estimate is gotten by counting the number of times we see The eulogy at the start of a sentence, and dividing by the number of times we see The at the start of a sentence. (I’ve written these as C(The eulogy) and C(The).) The reason we’re dividing is that we know, when we reach the second word in the sentence, that the first word was The, and we want to know what proportion of those sentences continue with eulogy. (If we wanted the probability that a randomly-chosen sentence starts with The eulogy, we’d divide by the total number of sentences in the corpus instead.) In the corpus, there are 2.3 billion sentences starting with The, and 2288 starting with The eulogy. So, given that we’ve seen The, there’s a one-in-a-million chance we’ll see eulogy next.

From 1/1,000,000 to 1/20. Okay, thanks for sticking with me through the math. Now let’s talk about what this really means. In conversation, saying that the odds are a million to one against something means that it’s not going to happen, and yet we often see these linguistic one-in-a-million events. In fact, to finally get around the point I mentioned in the post title, it turns out that if the first word of a sentence is The, there’s a one-in-twenty chance of a one-in-a-million event. How’s that?

Well, let’s start by thinking of a scenario where there is a 100% chance of a one-in-a-million occurrence: a sweepstakes with one million tickets. If the sweepstakes is above-board, someone has to win it. The probability of some specific ticket winning is one in a million, but at the same time, the probability that some ticket wins is one. The sweepstakes is guaranteed to generate a one-in-a-million event based on the way it is set up. That’s why it’s no surprise to find out someone won the lottery, but it’s a shock when it turns out to be you.

Now suppose you want to boost your chances by buying 1,000 tickets. Each individual ticket still has the one in a million probability, but the probability of the winning ticket being one of your purchased ones is now one in a thousand. This is sort of what’s going on in the linguistic world. In language, there are so many low-probability words that even though they individually have less than a one-in-a-million chance of following The, the aggregate likelihood of seeing one of these words is relatively high. The starts 2.3 billion sentences, and of those sentences, .15 billion of them continue with “rare” words like eulogy or kitties in the second position. Each of these words is individually rare, but there are a whole lot of them, so they carry a lot of probability mass.

Outside of language, too. This is a more general point: rare events aren’t so rare if you don’t care which rare event occurs. As a big sports watcher, I’m always amazed at a good sports statistician’s ability to find a rare event in the most mundane of games. For instance, consider today’s report from the Elias Sports Bureau, where they note that yesterday was the first time that there were seven or more baseball games in which the winning team scored less than four runs and won by a single run since May 21, 1978. It’s a rare event, sure, but if it hadn’t been this specific rare event, it would have been another.

The webcomic XKCD shows how science (and science journalism, especially) can suffer from this same problem. A statistically significant result is generally one where there is only a one-in-twenty chance of its occurring as a coincidence. But if you test, as the scientists in the comic do, twenty colors of jellybeans independently to see if they cause acne, at least one probably will appear to. (There’s a 64% chance of that, in fact.) Again, a rare event becomes expected simply because there are so many ways it can occur. This is why it’s easy to find coincidences.

This is getting long, even for me, so let me wrap up with two basic points. If a lot of events occur, some of them will almost certainly be rare ones. Similarly, if all of the possible outcomes are individually rare, then the observed outcome will almost certainly be rare. It’s true in language, and it’s true in life. I’m sure you can find other morals to these stories, and I’d love to hear them.

[By the way, if you want to read more math posts written for a lay audience, go check out Math Goes Pop!, written by my college roommate who, unlike me, didn’t defect out of mathematics.]

The spectre of common usage is one of the greatest bugaboos for amateur grammarians, who fear that if we accept a usage because everybody’s using it, we’re weakening the language. For example:

We have often noted that often repeated language and grammar errors seem to become “correct” usage. Wouldn’t it be weird if math used that philosophy? When enough people said 2+2=5, it would! It would still equal 4, of course, but it would also equal 5.

I’ve heard this “2 + 2” kind of argument many times. It’s a false analogy, a virulent argument that seems reasonable but is wrong at its very core, and is wrong in multiple ways. It misrepresents both language and math, and that makes me mad, because the two things I’ve spent substantial portions of my academic life on are math and language. So let me tell you why this argument is rubbish in both aspects.

Grammaticality isn’t Truth. Math and language are different in a lot of ways. Duh, of course; don’t writers claim to not be “math people” and mathematicians claim to not be “language people”? Well, yeah, but it runs deeper than that. Language comes from our minds and our cultures. There isn’t some true, verifiable, Platonic version of language floating out in the ether that we’re trying to use. It is a social construct. And it’s a nebulous social construct, because we don’t know where it came from, or how it’s evolved over long periods of time. Hell, we don’t even have a clear view of the proper theoretical framework to analyze language (see, e.g., the debates between the Minimalists and HPSG or LFG syntacticans).

If everyone suddenly decides that the word flartish describes an object that is orange-brown, then the word means that. If everyone later decides that flartish describes an object that is wider than it is tall, then the word means that. There is no physical, no Platonic, no real meaning for a word. The meaning of a word is nothing more or less than what the speakers of its language believe it to be. That doesn’t mean that a meaning can’t be wrong; if you start re-assigning definitions to words, like Humpty Dumpty did to glory, you’ll be wrong in that no one will understand you.

The same is true of a language’s grammar. There is no English outside English, nor Telugu outside Telugu. When a language changes — and they do, constantly — that changes what is standard and non-standard in that language. The English you speak now is the results of billions of changes that took place over thousands of years from Proto-Indo-European. The reason that English and Albanian and Urdu aren’t the same language is that they all have undergone changes through common usage. It’s what happens. Math doesn’t change in this same way. Correct proofs can’t become incorrect in the way that grammatical sentences can become ungrammatical.

Sometimes 2+2 doesn’t equal 4. For comparison’s sake, how does mathematical truth work? Well, it works like this.* Before you do anything in math, you have to first lay down a set of axioms, a set of statements that you take as true and cannot prove. The most famous set of these is probably Euclid’s “4 + 1” postulates of plane geometry, which state the existence of line segments, lines, and circles, as well as the equivalence of all right angles and the uniqueness of parallel lines. If you want to prove something in Euclidean geometry, you build up from those axioms. If you want to define something (a triangle, for instance), you define it in terms of those axioms or in terms of things built up from those axioms. So when you prove something like “the sum of the angles of a triangle is 180 degrees” — one of the rudiments of Euclidean geometry that we learn as kids –, it’s true only as long as the axioms are true.

Now, here’s the rub: a true theorem under one set of axioms is not necessarily true under other axioms. For instance, a triangle has to have 180 degrees worth of angles in plane geometry because Euclid’s axioms hold on a plane. A sphere breaks the Euclidean axioms, though, because parallel lines don’t exist on its surface.** This causes the mathematical truth to become untrue, as shown by a thought experiment, pictured below.

Suppose you decide to go to the North Pole. You start out heading due north from your current position. You get to the North Pole, and take a 90 degree turn to the right, and then head due south until you’re at the same latitude you started at. Now you’re hungry, so you decide to head back to your starting point. So you take another 90 degree turn to the right and head west until you’re back at your starting point. Then you turn 90 degrees one last time to face north and return to your starting alignment.

You're basically Admiral Byrd in this thought experiment, except that you actually made it to the North Pole.

You’ve made a triangle, but you’ve turned a total of 270 degrees. The “true” theorem that a triangle’s angles sum to 180 degrees isn’t true if its axioms aren’t valid. Returning to the seemingly stronger example of 2+2=4, it’s also true under certain arithmetical axioms and groups, but not all. If we’re talking about the group of natural numbers, then yes, 2+2 equals 4. But if we move from base 10 to base 3, 2+2=11. And if we’re talking of the cyclic group of order 3, then 2+2=1, and 4 doesn’t exist.

Mathematical truth is only as true as its underlying axioms, and these examples show that when those axioms are changed, the “truth” falls apart. The claim that common usage could make two plus two not equal four isn’t scary; it’s obvious. We only take this as truth because in common usage, we’re usually talking about the infinite set of natural numbers.

Well, that’s sort of how different languages work. It’s as though English has a rule that says 2+2=7, Malay has a rule that says 2+2=5, and so on. But because the languages have different systems, those different rules can each be valid.

Languages aren’t “right”. There’s a point I really want to hammer home with all of this: who says the English we have now has two and two equalling four? The quote at the beginning of this post presupposes that the present form of English is “right”, and that new deviations from it must therefore be wrong. But our modern English is different not only from other contemporary languages but also from its earlier forms. How do we know that Old English didn’t have two and two equalling four and our modern version has it wrong?

Well, we do know that’s not the case, and that’s because languages aren’t inherently right or wrong. By various measures, one can argue that a specific change to a language brought on by common usage is helpful or harmful, but change itself is not inherently bad — or good, for that matter. So let’s stop deifying the language we currently have and demonizing the changes. It might well be that we’ve got the roles reversed.

*: I want to note here that I only have a Bachelor’s degree in math, and we really didn’t go too far into the philosophy of mathematics. I might have some mistakes or controversial interpretations with the details of this section (and if I do, please point them out in the comments), but I believe the core points of this section are accurate.

**: “Lines” on a sphere are restricted to circles that span a diameter of the sphere, so-called “great circles”. Any two great circles on a single sphere will inevitably intersect, so no non-intersecting lines exist in this geometry.

I feel like this past month more and more people have mentioned to me their belief that languages either do or should strive to be logical. On the one hand, this is an obvious point. A more logical language is a more learnable language, and since language is passed down from generation to generation, we expect that exceedingly difficult-to-acquire portions of a language will be eventually lost by this process. That’s fairly uncontroversial and is known as “regularization” in linguistics. But the problem is that the logic of language is generally opaque. It’s not the same as the logic of mathematics or the logic of argumentation, so it’s hardly obvious what it means for a language to be logical. I’d wanted to make a post about this, but I was having trouble saying what I meant to say. Thankfully, my labmate, Emily Morgan, ended up saying some great stuff about it in a comment elsewhere. She’s been kind enough to elaborate on those thoughts here. Without further babbling from me, here’s a guest post from her.

When linguists speak out against prescriptivism, one question we get asked is why we care so much about it. This post is an attempt to answer that question.

To begin with, it’s important to point out that linguists generally aren’t blanketly opposed to prescriptivism; rather, we’re opposed to uninformed or misinformed prescriptivism. So for example, I’m very much in favor of standard spelling and punctuation use, but with the understanding that these are more or less arbitrary conventions–not because I believe that these particular conventions are better than any others. Prescriptivist rules often come with supposed justifications, but under further scrutiny those justifications frequently don’t hold water. In particular, many rules are justified on the basis of some “logical” argument. The problem with that is that it’s easy to construct arguments that sound logical for certain cases, but don’t follow the bigger-picture logic of how language works. To give an analogy from mathematics, I could make a pseudo-logical argument that because we count …8, 9, 10… then the next number after 18, 19 should be 110. Of course, given an understanding of how the decimal system works, that’s nonsense. But without that broader understanding, it would sound logical. So bringing this back to language, if someone tries to argue, for example, that You drive too slow is incorrect because slow is an adjective not an adverb, that sounds logical under the simplified view that slow is an adjective while slowly is an adverb. But in the bigger picture, we find that slow can be used either as an adjective or an adverb–and has had both uses for hundreds of years.

That bigger-picture argument puts a lot of weight on descriptive generalizations about how native speakers use their language. I think it’s important to understand why linguists so often use arguments like these, which are based on descriptions of what native speakers do. The underlying reason is that language is a natural phenomenon, and our goal as linguists is to understand how it works. And to do so, we call upon all the empirical tools of science, and our primary source of data is the way that people actually do use language. Now, I recognize that how people do use language and how people should use language are not inherently the same thing. But I think that any claims about how people should use language need to be grounded in a solid understanding of what language is. And I think that many prescriptivists fundamentally misunderstand this. Language is not an ideal system that we as individual speakers are trying to draw upon or conform to. Language is something that we as a community of speakers collectively create and reinvent each time we speak. So any statement that we make about language is inextricably rooted in a descriptive generalization about what that community does. Even the most fundamental notions of grammar—things like the division of utterances into words, or the grouping of words into parts of speech—are not a priori assumptions about how communication should work: rather, they’re based on our empirical understanding of how speakers treat language.

So in the bigger picture, why do we linguists care about all of this? There’s a lot of reasons, but I think the most fundamental is that there’s hugely widespread misunderstanding of a topic that we care a lot about, and we feel a professional obligation to set the record straight. In the worst case, baseless prescriptions like “don’t split infinitives” or “don’t end a sentence with a preposition” actually lead to worse writing, as people learn to go through contortions to avoid what are actually perfectly standard grammatical constructions. In milder cases, people just waste their time trying to remember rules like the supposed distinctions between that/which and less/fewer, which are mostly harmless when followed, but equally harmless when violated. Additionally, as Gabe discussed recently, these shibboleths distract from the true pleasure of studying language, which is an amazingly rich and fascinatingly complicated system—but instead of being exposed to the excitement of unsolved questions in linguistics, people are instead being drilled on arbitrary and unnecessary rules. To draw another analogy to math, it’s the same sort of regret I feel for people who had poor math instruction early in school, and end up hating all things number-related, without ever seeing the beauty of abstraction that comes out in higher-level math. (If you are one of those number-haters, feel free to substitute your own favorite discipline or activity, and consider that sense of “But you don’t understand!” that you feel when someone misunderstands it or dislikes it for no good reason.)

Finally, I want to clarify that in arguing for more permissive, less prescriptive attitudes towards grammar, we are not trying to convince people to use language in ways that sound unnatural to them. As native speakers, we all have intuitions about what sounds right and what sounds wrong. Gabe can say “needs done”, but to me that sounds unnatural, and so I never use it myself. One underlying assumption to the linguist’s descriptive approach to language, which we probably don’t stress enough, is that there can be more than one right way to say something, and the fact that we are describing variation between speakers does not mean that we expect to find the same variation within all individual speakers. So no one is trying to convince you to say “needs done” if it sounds wrong to your ear—we’re only trying to convince you not to be upset if someone else does use it. As a caveat, I recognize that this position gets more complicated when thinking about English as a Second Language instruction, or when teaching people who have grown up speaking a dialect that deviates in major ways from Standard English, in which cases it’s obviously valuable to discuss what standards exist and what cultural implications they bear. But even in these cases, the fundamental ideas remain unchanged: we should acknowledge variation as natural, and any usage advice needs to be based on factually grounded descriptions of that variation.

Post Categories

The Monthly Archives

About The Blog

A lot of people make claims about what "good English" is. Much of what they say is flim-flam, and this blog aims to set the record straight. Its goal is to explain the motivations behind the real grammar of English and to debunk ill-founded claims about what is grammatical and what isn't. Somehow, this was enough to garner a favorable mention in the Wall Street Journal.

About Me

I'm Gabe Doyle, currently a postdoctoral scholar in the Language and Cognition Lab at Stanford University. Before that, I got a doctorate in linguistics from UC San Diego and a bachelor's in math from Princeton.

In my research, I look at how humans manage one of their greatest learning achievements: the acquisition of language. I build computational models of how people can learn language with cognitively-general processes and as few presuppositions as possible. Currently, I'm working on models for acquiring phonology and other constraint-based aspects of cognition.

I also examine how we can use large electronic resources, such as Twitter, to learn about how we speak to each other. Some of my recent work uses Twitter to map dialect regions in the United States.

@MGrammar on twitter

Recent Tweets

If you like email and you like grammar, feel free to subscribe to Motivated Grammar by email. Enter your address below.

Join 980 other followers

Top Rated

%d bloggers like this: