You are currently browsing the category archive for the ‘research’ category.

You might have noticed that I’m on a bit of hiatus this month. I’m working on my dissertation and preparing applications for post-grad-school jobs, but luckily something I’d done a little while ago has come through the pipeline for me to share.

Back in June, I presented a portion of my dissertation research at the NAACL-HLT conference, but all I had at the time was a computationally-dense paper to show you.  Well, the conference has uploaded videos of all the presentations, so if you’re interested in what I actually do academically, you can find out.  In short, this portion of my research is about the improvements in word segmentation that happen when you combine multiple types of information instead of using a single type.  It’s a computational model of how infants could use additional information to learn words better, as well as learning the likely stress patterns for words in the language they’re learning.

The video [20 min, plus questions that are unfortunately hard to hear]

I tried to make the talk approachable to the non-specialist, so take a gander if you want to see some of my dissertation research (which, of course, is pretty far afield from the discussions on this blog).  There will be math, too, in case you are a specialist, and if you want the whole story, you can see the paper that accompanies the talk.

In other news, I’ll be giving a talk on some of my new research showing how Twitter can be used to map the range of dialectal syntactic variants (e.g., double modals like might could and the needs done construction) at the LSA annual meeting in Minneapolis on January 3.  Check out the abstract here, and maybe I’ll see you there!

You may have noticed that I’ve been being quite bad about updating the blog the last couple months. I’m sorry for my negligence, and I’m hoping summer will leave me with a bit more time to keep up the blog. But the reason I’ve been remiss is that it’s time to really batten down the dissertation hatches, and boy, that doesn’t leave the time or energy for much else.

Tomorrow morning, the battening of said hatches pays off a little bit, because I’ll be presenting a portion of my dissertation research at the North American Chapter of the Association for Computational Linguistics (NAACL)’s conference. I don’t imagine too many of you are attending the conference, but if you are, I’ll be presenting in Session C tomorrow (i.e., Monday) morning at 11:55, so swing on by.

If you’re not down here in balmy Atlanta, you can always read the paper in the comfort of wherever you are. It looks at how an infant learning a language can combine syllable identities and stress patterns to segment words within the language they’re learning. I’ll warn you, it’s a lot less accessible than the stuff I write here, but it’s the actual computational psycholinguistic research that I earn my keep with, so I hope you’ll give it a look if you’re interested in such things.

Today I’m unveiling a little side project I’ve been doing off and on for the past few months, one that I previewed a bit in last week’s All of what sudden? post. It’s called SeeTweet, and it generates maps with the locations of the most recent tweets containing a search term. So if, for instance, you want to assess the geographical extent of a dialectal variant, you can. Let’s say you’ve been hearing about the needs done construction, as in

(1) Maybe the majority’s attitude needs adjusted

and now you want to know where people say something so silly. Well, SeeTweet can tell you:

Needs fixed map

Mapping "needs fixed" with SeeTweet

As you can see, it’s pretty well localized to a stretch from Iowa to central Pennsylvania, a region similar to the (North?) Midland dialect region.* Of course, this particular case doesn’t need SeeTweet. Murray, Frazer, and Simon wrote a series of papers detailing the geographic range of this and related usages (e.g., wants done) in the late 90s, and the Yale Grammatical Diversity Project has also mapped known usages of needs done. But whereas this previous work has required a lot of time and effort, SeeTweet provides a quick and easy approximation, a starting point for more advanced investigations.

It’s no replacement for the YGDP or the Dictionary of American Regional English, of course; it’s much noisier data than either of these projects. It can offer a different kind of view, though, one that can be assembled to track more ephemeral usages (e.g., event-related usages like “Carmaggedon” or “Jerry Meals“) in real-time, as well as assembling a lot of data on persistent usages (e.g., pop and soda).

So I’m hoping that you’ll be able to go out and use SeeTweet to look into the geographical distribution of something interesting, whether for academic purposes or just to waste time at the end of the week. I’ve put together some sample investigations in a SeeTweet gallery, and I’d love to see what sort of great uses you’ll put it to. If you find something neat, leave a comment here or in the gallery, or send an email to

[A couple of friends offered great advice/testing on earlier versions of SeeTweet and must be acknowledged for it. Thanks to Dan (who came up with the name SeeTweet), Rodolfo, Maria, Casey, Ari, Rebecca, Noah, Anoush, and Chris.]

*: There are a couple of dots out West, but I’m betting that those are from immigrants like me who were raised in the Midland region and ended up out West.

What’re you doing this Friday? If you’re around Whistler, Canada, you should come to the “Applications for Topic Models: Text and Beyond” workshop at NIPS, where you can learn about all sorts of new, well, applications for topic models! I’ll be delivering a talk on financial applications of topic models, such as identifying industrial sectors and discovering the underlying relationships between companies. It’s a way of learning about the economy without having any distracting chance of making money off of it!

Post Categories

The Monthly Archives

About The Blog

A lot of people make claims about what "good English" is. Much of what they say is flim-flam, and this blog aims to set the record straight. Its goal is to explain the motivations behind the real grammar of English and to debunk ill-founded claims about what is grammatical and what isn't. Somehow, this was enough to garner a favorable mention in the Wall Street Journal.

About Me

I'm Gabe Doyle, currently an assistant professor at San Diego State University, in the Department of Linguistics and Asian/Middle Eastern Languages, and a member of the Digital Humanities. Prior to that, I was a postdoctoral scholar in the Language and Cognition Lab at Stanford University. And before that, I got a doctorate in linguistics from UC San Diego and a bachelor's in math from Princeton.

My research and teaching connects language, the mind, and society (in fact, I teach a 500-level class with that title!). I use probabilistic models to understand how people learn, represent, and comprehend language. These models have helped us understand the ways that parents tailor their speech to their child's needs, why sports fans say more or less informative things while watching a game, and why people who disagree politically fight over the meaning of "we".

@MGrammar on twitter

Recent Tweets

If you like email and you like grammar, feel free to subscribe to Motivated Grammar by email. Enter your address below.

Join 981 other subscribers

Top Rated

%d bloggers like this: