You are currently browsing the category archive for the ‘research’ category.

You might have noticed that I’m on a bit of hiatus this month. I’m working on my dissertation and preparing applications for post-grad-school jobs, but luckily something I’d done a little while ago has come through the pipeline for me to share.

Back in June, I presented a portion of my dissertation research at the NAACL-HLT conference, but all I had at the time was a computationally-dense paper to show you.  Well, the conference has uploaded videos of all the presentations, so if you’re interested in what I actually do academically, you can find out.  In short, this portion of my research is about the improvements in word segmentation that happen when you combine multiple types of information instead of using a single type.  It’s a computational model of how infants could use additional information to learn words better, as well as learning the likely stress patterns for words in the language they’re learning.

The video [20 min, plus questions that are unfortunately hard to hear]

I tried to make the talk approachable to the non-specialist, so take a gander if you want to see some of my dissertation research (which, of course, is pretty far afield from the discussions on this blog).  There will be math, too, in case you are a specialist, and if you want the whole story, you can see the paper that accompanies the talk.

In other news, I’ll be giving a talk on some of my new research showing how Twitter can be used to map the range of dialectal syntactic variants (e.g., double modals like might could and the needs done construction) at the LSA annual meeting in Minneapolis on January 3.  Check out the abstract here, and maybe I’ll see you there!


You may have noticed that I’ve been being quite bad about updating the blog the last couple months. I’m sorry for my negligence, and I’m hoping summer will leave me with a bit more time to keep up the blog. But the reason I’ve been remiss is that it’s time to really batten down the dissertation hatches, and boy, that doesn’t leave the time or energy for much else.

Tomorrow morning, the battening of said hatches pays off a little bit, because I’ll be presenting a portion of my dissertation research at the North American Chapter of the Association for Computational Linguistics (NAACL)’s conference. I don’t imagine too many of you are attending the conference, but if you are, I’ll be presenting in Session C tomorrow (i.e., Monday) morning at 11:55, so swing on by.

If you’re not down here in balmy Atlanta, you can always read the paper in the comfort of wherever you are. It looks at how an infant learning a language can combine syllable identities and stress patterns to segment words within the language they’re learning. I’ll warn you, it’s a lot less accessible than the stuff I write here, but it’s the actual computational psycholinguistic research that I earn my keep with, so I hope you’ll give it a look if you’re interested in such things.

Today I’m unveiling a little side project I’ve been doing off and on for the past few months, one that I previewed a bit in last week’s All of what sudden? post. It’s called SeeTweet, and it generates maps with the locations of the most recent tweets containing a search term. So if, for instance, you want to assess the geographical extent of a dialectal variant, you can. Let’s say you’ve been hearing about the needs done construction, as in

(1) Maybe the majority’s attitude needs adjusted

and now you want to know where people say something so silly. Well, SeeTweet can tell you:

Needs fixed map

Mapping "needs fixed" with SeeTweet

As you can see, it’s pretty well localized to a stretch from Iowa to central Pennsylvania, a region similar to the (North?) Midland dialect region.* Of course, this particular case doesn’t need SeeTweet. Murray, Frazer, and Simon wrote a series of papers detailing the geographic range of this and related usages (e.g., wants done) in the late 90s, and the Yale Grammatical Diversity Project has also mapped known usages of needs done. But whereas this previous work has required a lot of time and effort, SeeTweet provides a quick and easy approximation, a starting point for more advanced investigations.

It’s no replacement for the YGDP or the Dictionary of American Regional English, of course; it’s much noisier data than either of these projects. It can offer a different kind of view, though, one that can be assembled to track more ephemeral usages (e.g., event-related usages like “Carmaggedon” or “Jerry Meals“) in real-time, as well as assembling a lot of data on persistent usages (e.g., pop and soda).

So I’m hoping that you’ll be able to go out and use SeeTweet to look into the geographical distribution of something interesting, whether for academic purposes or just to waste time at the end of the week. I’ve put together some sample investigations in a SeeTweet gallery, and I’d love to see what sort of great uses you’ll put it to. If you find something neat, leave a comment here or in the gallery, or send an email to

[A couple of friends offered great advice/testing on earlier versions of SeeTweet and must be acknowledged for it. Thanks to Dan (who came up with the name SeeTweet), Rodolfo, Maria, Casey, Ari, Rebecca, Noah, Anoush, and Chris.]

*: There are a couple of dots out West, but I’m betting that those are from immigrants like me who were raised in the Midland region and ended up out West.

What’re you doing this Friday? If you’re around Whistler, Canada, you should come to the “Applications for Topic Models: Text and Beyond” workshop at NIPS, where you can learn about all sorts of new, well, applications for topic models! I’ll be delivering a talk on financial applications of topic models, such as identifying industrial sectors and discovering the underlying relationships between companies. It’s a way of learning about the economy without having any distracting chance of making money off of it!

Post Categories

The Monthly Archives

About The Blog

A lot of people make claims about what "good English" is. Much of what they say is flim-flam, and this blog aims to set the record straight. Its goal is to explain the motivations behind the real grammar of English and to debunk ill-founded claims about what is grammatical and what isn't. Somehow, this was enough to garner a favorable mention in the Wall Street Journal.

About Me

I'm Gabe Doyle, currently a postdoctoral scholar in the Language and Cognition Lab at Stanford University. Before that, I got a doctorate in linguistics from UC San Diego and a bachelor's in math from Princeton.

In my research, I look at how humans manage one of their greatest learning achievements: the acquisition of language. I build computational models of how people can learn language with cognitively-general processes and as few presuppositions as possible. Currently, I'm working on models for acquiring phonology and other constraint-based aspects of cognition.

I also examine how we can use large electronic resources, such as Twitter, to learn about how we speak to each other. Some of my recent work uses Twitter to map dialect regions in the United States.

@MGrammar on twitter

Recent Tweets

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

If you like email and you like grammar, feel free to subscribe to Motivated Grammar by email. Enter your address below.

Join 945 other followers

Top Rated

%d bloggers like this: