All posts by richard

Foreign Correspondence: Doing a Computational Linguistics Masters in Germany

I am currently sitting in a conference in Germany, listening to my project leader ramble on about the benefits of short coding courses for students. At least, I hope that is what she is talking about. I don’t know German. How did I get here? I’m not sure either. But, in an attempt to help you understand what it means to do a masters in Computational Linguistics, I’ll try and explain.

Getting into Linguistics

I graduated from the University of Edinburgh last year. I had gone there for my entire undergraduate, but Linguistics was only the center of that for two years. It’s worth talking about the longer view if you want to understand that. So, the long view: around 3.5 billion years ago, life began. Shorter view: I read the Chronicles of Narnia when I was 4, and knew that I wanted to read books for the rest of my life. When I learned one can write them, I wanted to be an English professor. Skip forward through high school, and I’m at Edinburgh doing an undergraduate degree in English and Classical Literature.

Around a week in I realise I don’t need to be in Classical Literature – after 5 years of Latin in high school, I didn’t need to read the Aeneid in translation. Not that I was ever very good at Latin, but I stuck with it. So, I switch courses for the first time. I do Greek and English Literature. After a semester, I grow increasingly tired of the utter powerlessness of English Literature to transform texts. I didn’t feel like I was opening my eyes reading a text, and I didn’t walk away with a better life after applying a postmodern, Barthian view to Confessions of a Justified Sinner. So, I dropped English. This sort of concern always lingered though – I feel that what you’re studying, which is essentially what you’re living, should stand up well to the question ‘How does this help anyone?’ We’ll get to how Linguistics answers this in a bit.

And this is where Linguistics begins. I realise I kind of like it. I always liked Latin, and Greek, and English – not for themselves, but because they language. If you’re reading this blog, you don’t really need to hear a long praise of Linguistics, you know what I’m talking about. So, I switched to Greek and Linguistics. No one was surprised a year later when I realised I’d rather study Language than language, and dropped Greek. So, for my honours years, I was 100% Linguistics.

Linguistics to Computational Linguistics

Of course, that doesn’t mean I studied very hard. I goofed around a lot. I learned Na’vi. I did a lot of society stuff, and tended to spend a lot of nights in the Auld Hoose and Opium. There was one course that got my attention, though – Simulating Language. Here was a course that had an answer to my questions about ‘Why would anyone really care about pronoun usage in Othello?’ The baseless theorizing of English literature was discarded in favor of experimental modelling – actually trying to get to the bottom of how language might have evolved. And, what’s best, it involved Python. I was a bit scared at first, but after I had this weird idea of coding glossolalia and seeing if language might have evolved from drunk shaman rambling around the fire in the open savanna, I ended up learning it myself and trying to code in it. This later turned into my thesis – actually, that focused on word segmentation in nine month old children, but the code is surprisingly similar.

If any of you have ever coded, you might know what I mean. If you haven’t – imagine that first time you switched to GMail. Or imagine when you first drove a car. Or imagine learning to walk – you take this obscure, impossible system, and make it work. And suddenly, things are doing what you want them to. It’s the ultimate power rush. I liked that. I also liked that you could make money (well, live) doing this sort of stuff. And I could potentially make a career out of it. Finally, I also like the fact that the question – who does this help? – could be answered. For instance, some of my work on Na’vi involved coding. And although it was a hobby, and a fake language, and people don’t tend to view those things seriously – I helped people learn a language, often the first foreign language they had learned. I helped people connect to speakers of other languages who they couldn’t normally talk to. For some, I helped forge a sense of identity and community. I slept pretty well on nights when I recognised that it wasn’t just a silly pursuit, but helping people have fulfilling hobbies. There are better examples I could cite, but this one should work.

So, I figured I’d keep running with it. I knew I wanted to stay in academia – I’d wanted to be a professor since I was 6 – so I started applying for Masters and PhDs. Somewhere along the way, I applied to the Erasmus Mundus for Language and Communication Technology. It’s a double masters, at two universities, and it had a nice stipend, too. Most of the PhDs didn’t work out – although now I know a lot more, and I don’t think that’ll happen again – but I did get into this program. So, pretty soon, I graduated, and headed to the continent.

Courses in Germany

So that’s where I’ve been this past year: in Germany. I don’t know German. But I have learned a lot more. Here’s an example of some of my courses:

  • Pattern and Speech Recognition: Here, I learned a lot of math. I was horrified at first, and left the classroom feeling like I had just gotten whiplash from a roller coaster. But you get better, calculus isn’t so bad, and I even had fun coding the weekly examples when I managed to do them. I know a lot more about classification now. If you ever wanted to pick things out of noisy data – be it sound, or whatever – this is a cool thing to learn.
  • Computational Linguistics for Low Resource Languages: This is the coolest course I have ever taken. Reading lots of papers about how to help languages that don’t have much technology. For instance, after the Haitian earthquake, there was a lot of time that went into providing help in Creole. People could text in to an international aid hot line, in their native language, and the message would get translated and sent to aid workers who could help out. Hundreds of lives were saved because of this. That’s pretty awesome.
  • Statistics in Linguistics: If you’re ever fought with SPSS or not understood a paper because it used a lot of stats, it turns out that that sort of thing is avoidable. There’s nothing quite so cool as seeing results for an experiment you’ve run in R, the prettiest graphing program ever, and being able to read about stats is something that is worth it.

Obviously, I’m not covering a lot of things about the courses. Suffice to say, I’m learning stuff that I think is pretty cool. I have other courses, too, and a job building a repository, and I even managed to think of, write, and publish a paper in a conference with one of my professors here.

Life here

So, what does my average day look like? Well, I’d like to say I sleep in, have waffles, go to class for a bit, have an hour of homework, and then go hang out with friends. That is what a normal person would do, and some of my friends do that. I think that this is possible, and that it is possible to balance your life and work, to make connections easily and sail through a Masters program. That’s not why I came here though – I took this program because I wanted to learn how to code, I wanted to practice, and I wanted to have more time to work on my activities.

So, what I end up doing is waking up, leaning over, and turning on my computer. I then generally work, or work-avoiding-work, for the rest of the day. There’s been a lot of 15 hour workdays here. I somehow manage to keep a girlfriend and to put in a lot of free time slack-lining and walking, but there’s always this pressure at the back of my head to work more. That pressure is so omnipresent I developed a time tracker and task manager to outsource it so I didn’t have to think about it. And that’s not the only downside – I’ve got tendinitis now, or something like it, from being on the computer too much. On top of that, I’ve had a lot of late nights trying to wrap my head around things I hadn’t even known could exist. Graduate school is not for the slacker.

On the other hand, that’s what I chose, and, for me, those are upsides. I get to go to conferences – I’ve been to Bristol, Japan, New Mexico, Mainz, etc. this year. I’m going to the Netherlands, France, Bristol again (for ULAB), and Prague twice this summer. I publish a lot, and that makes me feel good – partly because I like working on papers. I get to do a lot of fun projects for class – right now, I’m working on a project making a corpus out of a social network, and identifying endangered languages in multilingual texts. I get to work with cool people – everyone here is from a different country, and I’ve really been overdosed with French and German since I came here (if I had more time, I would know them now.) What’s more, I sleep well when I do sleep.

There are better blogs out there about what it is like being in graduate school. For me, it’s long hours, learning, and a sense of fulfillment. But I wrote this mainly to tell you that it is a pretty cool thing to do, a masters degree in computational linguistics. I suggest it, and I’m always online if you want to hear more.

Hear an’ Ear

Strangely, hear and ear are not cognates. Huh.

ear:

“organ of hearing,” O.E. eare “ear,” from P.Gmc. *auzon (cf. O.N. eyra, Dan. øre, O.Fris. are, O.S. ore, M.Du. ore, Du. oor, O.H.G. ora, Ger. Ohr, Goth. auso), from PIE *ous- with a sense of “perception” (cf. Gk. aus, L. auris, Lith. ausis, O.C.S. ucho, O.Ir. au “ear,” Avestan usi “the two ears”). The belief that itching or burning ears means someone is talking about you is mentioned in Pliny’s “Natural History” (77 C.E.). Until at least the 1880s, even some medical men still believed piercing the ear lobes improved one’s eyesight. Meaning “handle of a pitcher” is mid-15c. (but cf. O.E. earde “having a handle”). To be wet behind the ears “naive” is implied from 1914. Phrase walls have ears attested from 1610s. Ear-bash (v.) is Australian slang (1944) for “to talk inordinately” (to someone).

hear:

O.E. heran (Anglian), (ge)hieran, hyran (W.Saxon) “to hear, listen (to), obey, follow; accede to, grant; judge,” from P.Gmc. *hausjan (cf. O.N. heyra, O.Fris. hora, Du.horen, Ger. hören, Goth. hausjan), perhaps from PIE *kous- “to hear” (see acoustic). For spelling, see see head (n.); spelling distinction between hear and heredeveloped 1200-1550. O.E. also had the excellent adjective hiersum “ready to hear, obedient,” lit. “hear-some” with suffix from handsome, etc. Hear, hear! (1680s) was originally imperative, used as an exclamation to call attention to a speaker’s words; now a general cheer of approval. Originally it was hear him!

Taken from the etymonline.com dictionary. Fun times, eh?

New Google Corpus

As many of you readers know, I love the Google ngram corpus. Well, excitingly, Google has just released a new corpus that should be fun to play around with. Here’s the notice from the Linguist List email:

 

We’re pleased to announce a new corpus — the Google Books
(American English) corpus: http://googlebooks.byu.edu/

This corpus is based on the American English portion of the Google
Books data (see http://ngrams.googlelabs.com and especially
http://ngrams.googlelabs.com/datasets). It contains 155 *billion* words
(155,000,000,000) in more than 1.3 million books from the 1810s-
2000s (including 62 billion words from just 1980-2009).

The corpus has most of the functionality of the other corpora from
http://corpus.byu.edu (e.g. COCA, COHA, and our interface to the
BNC), including: searching by part of speech, wildcards, and lemma
(and thus advanced syntactic searches), synonyms, collocate
searches, frequency by decade (tables listing each individual string, or
charts for total frequency), comparisons of two historical periods (e.g.
collocates of “women” or “music” in the 1800s and the 1900s), and
more.

This American English corpus is just one of seven Google Books-based
corpora that we hope to create in the next year or two (contingent on
funding, which we are applying for in June 2011). If funded, the other
corpora will include British English, English from the 1500s-1700s, and
corpora of Spanish, French, and German (see the listing at
http://ngrams.googlelabs.com/datasets). Each of these corpora will be
based on at least 50 billion words of data, and they should represent a
nice addition to existing resources.

The Google Books (American English) corpus is freely-available at
http://googlebooks.byu.edu, and we hope that it is of value to you in
your research and teaching.

Crash Blossoms

I’ve been saving up my crash blossoms and misparsed headlines. Here are three I thought were particularly odd.

This one just plain confused me. There are four verbs in a row – stock, decline, forces, and increased. Of course, it turns out that ‘stock’ is the noun, as is ‘decline’. ‘Increased’ is a past participle modifying ‘management.’ ‘Forces’ is the verb that was supposed to be read – but this is darned confusing.
This one may not be an actual crash blossom. I was basically very confused because I never see ‘Engineer’ on it’s own, and ‘Studies’ for me normally implies more than one study, not the 1st person present form of the verb. There’s no false way to read this – I just got caught up in it, until I realised that ‘Engineer’ is singular, and I had misread it.

Again, probably not an actual crash blossom. I just thought it was very strange to use the term ‘world’. It almost implies, for me, that Japan is not on Earth. Strange.

We so sickened.

This has got to the best the worst music video or song I have seen in years. It’s so horrific I can’t look away.

I bring it up here because there is, hidden deep, deep, deep in the recesses of muck, some interesting constructions. I’m referring to:

  • We so excited
  • We gonna have a ball today

Well, so much for our auxiliaries. Why you all look for them in the video (if you find them, let me know), I am going to curl up and go back to sleep and hope I wake up back in the 1870s when people spoke proper English.

DINOSAUR PARTICLES!

If that title didn’t catch you, this comic should.

Dinosaur comics is one of the best things to come out of the internet. I wish I had the will power to read it every day, but there is only so much screaming I can take. On the other hand, they do have linguistics jokes every now and then. This one involves particles. Well, no, it doesn’t. A particle is a minor function word that has comparatively little meaning and does not inflect, like ‘off’ in ‘make off with the money’. ‘The’ is an article. Not a particle. But anyway. Let’s not focus so much that the sixth panel becomes true.

The King’s Speech and Stuttering

The phenomenal success of the film ‘The King’s Speech’ has propelled speech and language therapy and in particular, stuttering, into much talked about
issues. Inspired by this film, Katerina Hilari and Nicola Botting, editors of International Journal of Language & Communication Disorders (IJLCD), have put together a selection of recent IJLCD articles on the theme of stuttering. They chose only recent studies and aimed for articles that highlight the richness of stuttering research that takes place around the world.

The King’s Speech virtual issue can be accessed for free at www.wileyonlinelibrary.com/journal/ijlcd

Classics Research Seminars

Linguistics students may be surprised to hear that there is a whole section of the University dedicated to Historical Indoeuropean Linguistics and Literature. I’m talking of course about the Classics department, which is somewhat alienated from our own (This is partially due to scheduling: one can’t take Latin 1 and Linguistics 1 at the same time.) This is a shame, I think, because the Linguistics department sprang from Philologists in Classics, and they have a lot in common.

Just as we have Linguistics Circle, a place to hear about current research in Linguistics from world-renowned scholars, Classics has their own research seminar. Many of the talks are about linguistics, and I think that some of you would find them interesting. I’ve uploaded a schedule of their talks, here. Sadly, they occur immediately before our own talks on Wednesday, which conflicts with the Committee meeting – but really, these are so good on occasion that it might be worth giving LangSoc a miss.

Here’s the schedule: Edinburgh Research Seminar 2011 Semester 2. I hope to see you there every now and then.

Them’s Need It Told

One of the great things about blogging is that there is no deadline and no need for second drafts. Another great thing about blogging is that you can do it at 2 in the morning and justify it as “I’ve had a productive day, so f*** it.” A third great thing is that you don’t technically need to be aware of all of the facts: you can just blag it, of course, or you could appeal to your readership. This post kind of suits all of those.

I was listening to a song today that is incredibly chauvinistic, misogynistic, sexist, and generally downright rude, but it’s damn catchy and I like it despite my feminist’s conscience screaming in pain. I am talking about Calvin Harris’ “The Girls“. As I was walking along singing to myself under my breath in case anyone heard me, I noticed a strange construction:

I like them [ x ] girls.

As in, “I like them tall girls.” Calvin replaces the adjective with various ones I won’t bore you with. But where did this use of ‘them’ come from? I always learned them as the accusative plural pronoun – “I saw them,” for instance. After talking it over with my mate Sam Carter of Philsoc, we decided it probably originated as two separate intonational clauses: “I like them. The tall girls.” Over time, this might have gotten shortened to “I like them tall girls”, when ‘them’ began to take on a demonstrative property, as in ‘those’. This might also explain another dialectical variation from the US that I noticed in the show Firefly:

…Bring the good word to them’s need it told.

That’s just cool! A use of ‘them’ that has been fused with the relative pronoun and the subordinator, all in one. It should be “bring the good word to them who need it”, where ‘who’ is a fused relative.

What do you all think? Also, suffice to say, I am going to use them new constructions whenevs I can, because they’re shiny. Yes.