Reflections on Graduate School

I nearly failed out of grad school. People who know me already know this and the politics, not the science behind it. But I guess now you do, too. But over a decade later, I now know that it was overprotecting a lie. Let me explain.

Different linguistics programs have different ways of doing things. But the one that I attended at UPEN followed the relatively normal system of having mandatory coursework in the first few years, qualifying papers. You write a paper for publication, it’s graded by your professors, and language exams. In our case, translating an academic paper from a language relevant to your research, sitting a traditional competence exam, although more on that later, or writing a paper about a sufficiently exotic language. Yeah, that’s a direct quote. I’m told that when I translated a French research paper for my first exam, I translated more than anybody ever had. I thought you’re supposed to translate the whole thing in an hour and I’d failed. You couldn’t study at PEN without taking a full year of generative syntax.

Does it matter that my focus was initially game-theoretic pragmatics or that I ultimately wound up writing a dissertation on sociolects that uses geospatial statistical methods? No. Was there a course on introduction to linguistic typology and the kinds of broad questions relevant to the field? No. Somehow the syntacticians got a stranglehold on the foundational training requirements and linguistic typology was taught incidental to learning syntax in a Chomskyan tradition.

And in the first semester, the syntax professor gave me a C on the midterm, which is basically the kiss of death. I won’t tell you the story of how he told me nobody has ever come back from this and that I should just quit and that I shouldn’t have been admitted in the first place and how I listened and then asked him simply what would it look like to come back from this and extracted, painfully extracted by staying consistently focused on message what his actual standards were and then met and exceeded them with the final in the course of like 10 days.

The point of the story is that generative grammar in a Chomskyan approach, first X-bar theory and government and binding and later minimalism, were the screener, the gatekeeper, the barrier, the flaming hoop that you have to jump through. If you want to know what a real intralinguistics looks like, the broad questions, typology, what the field actually does when it isn’t busy gatekeeping, I’m making one link in the description. Consider it the orientation Penn forgot to give me.

I should probably mention that Penn at the time had a reputation for failing students out, of course, after paying their stipend for a few years because why burn $30,000 to $100,000 when you could waste the same amount of money tormenting a grad student for a few years? It was an absolute hazing. So, in my time there, I saw at least four people either asked not to return or given a terminal master’s degree, which sounds much more morbid than it is, as a parting gift. Some just walked off and never came back. Pretty sure one left after the language exam was just an antagonistic conversation in German with a syntax professor grilling them about the double passive construction in German.

So I learned syntax. I learned the [ __ ] out of it. I learned it so hard that I wrote a paper included in conference proceedings that prompted the professor who threatened to fail me out. I eventually got a gentleman’s B to come find me in the grad student’s office I was working in and ask me if I wrote it myself and who had helped me. I read all the books that I could find on X-bar theory and minimalism. I have feelings about the approaches in various textbooks, including some unpublished ones. I devoured Chomsky and syntax. I had my doubts. My adviser even referred to it as a “toxic system,” which if you’re an academic, is a sick burn. Things my wife said in casual conversation broke the model that I’d learned consistently. And a co-author of mine on the descriptive grammar of Black English just handed me a book that brought the whole house of cards down.

By the way, scan the QR or follow the link in the description if you want to know more about that project and sign up for updates. I’m completely rebuilding my idea of how the world works like a freaking cult survivor. And today I’m going to share that with you. This is going to be criticizing Chomsky for political things like denying the Cambodian genocide or his anti-government writings while accepting money from federal defense grants and is not going to criticize him for social things like his horrifically embarrassing interview with Ali G.

How many words does he know? What are some of them? Or even his epistolary correspondences with Jeffrey Epstein where he brainstormed how to rehabilitate the latter’s public image after the trafficking was widely known. He is Chomsky, that is by all accounts one who doesn’t research who he’s talking to and who is credulous and eager to help to a fault. He’s thinking about syntax the whole time.

This is Language Jones.

Anyway, I’m going to keep it linguistic. First, I’m going to explain what Chomsky got right and why he’s so important. Then, I’ll explain the challenges to his theory, including the niggling doubts I had even at the very beginning of grad school. And finally, I’ll explain the alternatives that I’m exploring and how they’re the last nail in the coffin.

First, let’s start with how Chomsky got famous and what Chomsky got right. His rise to fame in the 1960s in linguistics coincided with the cognitive revolution he helped kick off. A huge influence was his scathing review of B.F. Skinner’s verbal behavior in 1959, eviscerating Skinner’s behaviorist approach to language learning, which more or less reduced human language to stimulus response like classical conditioning of dogs. He was developing his theory of transformational grammar at the time.

And despite having not been enrolled at Penn for four years at the time, in 1955 he submitted a thesis and was awarded the doctorate. It’s astounding how different the times are. Anyway, he wrote Syntactic Structures in 1957 and rose to fame on the basis of that and his epic takedown of Skinner two years later. His approach has changed over the years. It’s been about 70 years of theorizing and work, but there are a few key points that he got really right, and I think they’re worth stating explicitly.

First, those who criticize Chomsky often criticize the concept of generative grammar. That’s not up for debate. We clearly have the ability to make use of a limited set of symbols or mental objects and create infinite novel utterances from combining them in new ways. Not only that, but the ways we combine them are constrained. There are grammatical and ungrammatical sentences. In linguistics, ungrammatical doesn’t mean socially stigmatized, like using a double negative. It means something that completely breaks your ability to communicate or parse the sentence. Something like, “What do you like and broccoli?”

Quick aside, if you find this stuff interesting and you want a proper grounding in what linguists mean by grammar, grammatical, and the dozen or so other terms I’m about to throw around, my intro to linguistics course is in the description. It’ll make the rest of the video hit a little bit harder.

It’s worth flagging upfront, generativity itself isn’t the controversy. The alternatives I’ll be talking about, dependency grammar and construction grammar, are also generative in this broad sense. They account for the discrete infinity of language too. The word generative just got captured by one specific research program. So a lot of people think rejecting Chomsky means rejecting generativity, and it doesn’t. These frameworks just generate differently by combining constructions or by linking heads and dependents without phrase structure trees and without movement.

Chomsky and his acolytes developed a very robust system for exploring how you can get a small number of pieces, a small number of conceptual rules, and generate language. Their goal was to describe a mental architecture that can give rise to all and only natural human languages. That is, it doesn’t over or undergenerate. They ended up pursuing an approach that uses graph theory. It treats words, we’re not going to define that for now, as nodes in an acyclic directed graph.

Chomsky later imposed the condition that they’re all binary branching for elegance of the theory and parsimony. Another quick aside that’ll matter later, dependency grammar also uses directed graphs, but the nodes are words connected directly to other words with no intervening phrasal nodes. No NP, no VP, no IP, no CP, just words in the asymmetric relationships between them. That’s a big deal because it means that the entire scaffolding of phrases that Chomsky’s theory rests on is in dependency grammar just not posited. It’s an ontological commitment Chomsky made that you don’t actually have to make. Tesnière was doing this in the 1950s parallel to and independent of Chomsky.

Part of Chomsky’s approach was to assume that there is a conceptual category called a phrase. So for instance, you might have a noun phrase that has a head, the actual noun, and arbitrarily many modifiers. The insight is that the whole thing acts like the head taking on its category. So when I said the whole thing, I could just as easily replace that with it. It acts like the head, and the sentence is perfectly grammatical because the categories match. Whereas if I tried to only use part of it, we’ve got problems.

Dependency grammar accounts for headedness, too. In fact, more directly, the head in dependency grammar isn’t an abstraction from a phrase. It just is the word that governs the dependence. The substitutability I just described falls out for free because when it replaces the whole thing, it’s just taking the same head position in the dependency structure. You don’t need to posit a phrase that acts like its head. The head is the structural anchor from the start.

Where it gets tricky is where Chomsky adds movement. The idea is that there’s an underlying base-generated mental form of a sentence structure, and other structures are derived by movement. So normal sentence structure in English is “I gave a chocolate to my wife.” To simplify the Chomskyan approach would be to say that the passive “my wife was given a chocolate” is derived from the basic structure of the other sentence. There’s a deep structure that’s a cognitive architecture and a surface structure. That’s the actual sentence we say or write.

incremental. Speakers really do start sentences without fully planning them. That empirical work is much more compatible with construction-based and dependency-based approaches that don’t require a fully specified deep structure to exist before any words come out of your mouth.

The Chomsky GG folks rebut that their model is one that explains the relationships among structures in an utterance, but isn’t attempting to be an explicit exact definition of what goes on in the brain. Except that’s exactly what they claim. If you’re not modeling language in the brain, which you absolutely are when you talk about your language acquisition device in the brain, then what are we even doing here? And honestly, the framework slides between those two claims in ways that immunize it from both kinds of evidence.

Performance data, that’s just performance, not competence. Brain data, well, neuroscience hasn’t caught up to the theory yet. This is what philosophers of science call an unfalsifiable framework. Heads I win, tails you lose. Not to mention that empirical studies consistently challenge claims about what is happening in the brain and what is possible. Psycholinguistic research demonstrated that Chomsky’s whole bit about anaphors, words like “himself,” is just not supported empirically.

So we moved on from government and binding when it became clear that the purely universal principles A, B, and C, the rare, actually falsifiable claims were false, at least some of them. The cross-linguistic work was particularly damaging. Long-distance reflexes in Mandarin, Icelandic, and Japanese showed that principle A, as originally stated, couldn’t be universal. Logographic pronouns in the discourse sensitivity of binding more generally turned out to be empirically thorny. Each fix made the theory more baroque and less predictive.

The nail in the coffin for me was the book “Syntax: A Cognitive Approach,” which is an introduction to dependency grammar. Now, here’s the thing. It’s still acyclic directed graphs with dependency relationships, but they posit that you don’t get “John ate the cake” and “the cake was eaten by John” by transforming one into the other. There’s just two different structural things.

And as I explore the literal decades of literature on dependency grammar and construction grammars that I was told either didn’t exist or was not important or was dismissed out of hand, as I look into those, they align with both the rest of the science I was familiar with and the models of cognition that other fields have robust empirical evidence for and frankly common sense.

Let me make this concrete. In a dependency analysis of “John ate the cake,” “ate” is the root with “John” as its subject dependent and “cake” as its object dependent, with the dependent on “cake.” In “the cake was eaten by John,” the structure is different. “Cake” is the subject dependent of the verbal complex and “John” is a dependent inside the by-phrase. Two distinct dependency structures, both directly produced, neither derived from the other. The relationships between them that they share truth conditions are semantic and pragmatic facts, not syntactic ones.

And beyond that, constructions like “the more the merrier” suddenly aren’t a problem. The dative alternation, the difference between “I gave my wife chocolate” and “I gave a chocolate to my wife,” cease to be a problem. They’re communicating slightly different things based on information structure and the existing context. And I don’t have to figure out how to make one into the other.

Even things like construction grammar’s ideas of inheritance and coercion are basically exactly what they sound like if you’ve ever coded a class in Python. Take coercion. When “sneeze” appears in a caused motion construction, “she sneezed the napkin off the table,” the construction coerces the verb into a transitive caused motion reading. In a movement-based grammar, this is mysterious. In construction grammar, it’s the construction contributing meaning the verb doesn’t have on its own, just like how a parent class in Python contributes methods to a child class.

And “the more the merrier” isn’t a weird exception to be banished to the periphery. It’s a construction with slots, the X or the Y, sitting in the same continuum as the transitive and the passive. They differ in degrees of productivity and abstractness, not in kind. It’s constructions all the way down. Even the dative alternation turns out not to be a single alternation.

Beth Lavine and others have shown different verb classes pattern differently, and the choices between forms are conditioned by information structure, weight, animacy, and discourse status. Quick Pause on Animacy, that’s how you can say things like, “I slid her the book.” But you can’t say, “I slid the door the book.” None of that falls out of a movement analysis. All of it makes sense if you start from constructions. So, here I am just over 5 years out from grad school, finally letting myself say out loud what my adviser was already not so subtly hinting at when he called it toic. The thing that nearly ended my career, the thing that I was told was the science of language, the thing that gate kept an entire generation of linguists out of a PhD program. It’s not the only game in town. But it’s not the best game in town, and it’s not even by the standards that we hold every other science to. An especially good game.

There are decades of work in dependency grammar and construction grammar that I was told just didn’t exist or didn’t matter or had been subsumed by the Chomskian framework. And none of that was true. The frameworks I was kept away from align better with cognition, better with neuroscience, better with what we actually see kids doing when they learn language, and better with the cross-linguistic data. I wasn’t bad at syntax. The syntax I was being taught was bad, and they should feel bad. Your music’s bad, and you should feel bad.

And the relief of saying that, I can’t even describe it. If you’ve ever been deep in something where the explanations kept not quite working, where you kept having to be told the doubts that you were having were just because you didn’t really fully understand it well enough yet, where the smart people around you kept gently suggesting maybe you weren’t cut out for this, you know, a cult. And then one day you read the right book and the whole thing snaps, and you realize that the doubts were the data, were the facts.

Yeah, that I’m not a cult survivor in any literal sense, but the cognitive shape of it is real, and I see it in other people who escaped other intellectual traditions that overpromised and underdelivered. You’re allowed to leave. If any of this resonated, if you’ve had your own version of this experience in linguistics or somewhere else, I want to hear about it in the comments. Subscribe if you want more in this vein because I have a lot more to say about syntax, about what gets taught as foundational versus what actually is, and about the politics of who gets to be a linguist.

And one more time, because it matters, scan the QR code or hit the link in description for the descriptive grammar of black English project. That’s the work that broke the model for me. That’s the kind of linguistics I want to do. Come along. Until next time, happy learning.

THE ARTFUL DILETTANTE

Keeper of the Flame of the Enlightenment

Reflections on Graduate School

Leave a comment Cancel reply

Parler

Leave a comment Cancel reply