Tag Archives: Gaelic

Last LangSoc Lecture of the Semester! Designing a Part-of-Speech Tagger for Scottish Gaelic

Click to be taken to the Facebook page for this event.

Wed. April 2nd – 18:00 – Lecture Theatre 2 – Appleton Tower

Will Lamb will be talking to us about a project he is currently heading up, here at the university. This marks our last lecture of the semester!

Language technology for Scottish Gaelic remains in an incipient state, compared to recent progress in the area for other European minority languages. It is crucial to provide certain key computational resources and tools for Gaelic if it is to participate fully in future, data-rich research paradigms, and a variety of NLP-driven applications, which would benefit a range of end users. The Carnegie Trust and Bòrd na Gaelic funded project, ‘An on-line part-of-speech (POS) tagger and gold-standard corpus of Scottish Gaelic’ was devised to help address this situation, with three main aims:

  • Develop a hand-tagged ‘gold-standard’ corpus (GSC) of Scottish Gaelic
  • Develop a POS tagger with an accuracy level of 97%, tested on the GSC
  • Make these resources freely available on the internet

As this one-year project approaches its half-way mark, Will Lamb will be reporting on work-in-progress. In particular, he will be taking stock of some of the challenges of instantiating an NLP pipeline with an under-standardised and morphologically rich language. The Gaelic nominal system, for example, is notably complex and is sensitive to variation conditioned by dialect, register and age. Dr Lamb will also present the results from their first statistically-induced tagger, based upon a finalized 12k word subset of the 80k word corpus.

Making off

I have quite a few Eastern European friends, and sometimes I use English words that they don’t know, and I have to explain them. They are usually the more literary or figurative words, which may not be included in a beginner’s or even intermediate language course. The problem is that many thousands of such words, even if they are quite infrequent, are nevertheless entirely appropriate to use in everyday conversation when called for, and all native speakers know them.

 Because of its peculiar history of having Germanic, French, Latin and sometimes Greek or other strata in the same lexical fields, and because of its worldwide dominance and use in a vast array of fields, it is far from unlikely that English has the largest and richest vocabulary of any language ever known. Royal, regal and kingly stand side by side and are not entirely interchangeable in meaning and ‘feel’, whereas a language such as German has just königlich. Languages such as French and Hindi have two main strata, the native development and then forms based on older, classical versions of themselves (Latin, Sanskrit), but English is unusual in having three such strata, and in having gone from a position of subjugation that it was semi-creolized with French to such a position of prestige that it can dominate and influence with loanwords almost all other languages on the planet, while still retaining its cheery promiscuous ease in taking its pick of words from those languages, from zeitgeist (German) to wiki (Hawai‘ian).

 So if I say ‘dastardly’, ‘ominous’ or ‘cower’ or ‘hold sway’ or ‘teeter’ in conversation with my friends, I suddenly see a look of incomprehension and have to stop to explain, which is sometimes very different. Living with non-native speakers of English is probably good training for being a lexicographer though!

 It is sometimes daunting and depressing when one considers how vast the vocabulary of English is, especially when I try to learn other languages (at the moment Russian fills me with despair…). I am studying Gaelic and know it fairly well, but reading old poetry I sometimes have to look up half the words: the traditional bards had enormous vocabularies, and they lived in a world where everyone was immersed in these words all their lives and knew what they meant. Such richness in Gaelic is fading fast, it is hard to find in any speaker under 60 years old, as the language gives way to English. But if it is any comfort (agus is beag an sòlas a th’ ann dhomhsa—nach eil de dh’eanchainn ann an ceann duine a dh’fhòghnas airson dà chànan beartach taobh ri taobh? — cha jean ee cosney ping dhyt, agh chamoo nee ee coayl ping dhyt), they are replacing Gaelic with a language that probably has the richest and most fertile idiom ever known. (This is not to say Gaelic is inferior; it is still very, very rich: but a language that has millions of speakers all communicating with each other by new-fangled means never dreamed of in other ages, and drawing from so many sources, will inevitably be off the scale as regards fertility and richness of vocabulary). Of course, Anglophones often boast about their language, and claim it is the most expressive on earth: but in certain respects there may be a grain of truth in it. English is certainly not the most aesthetically pleasing language to me personally (Gaelic or Welsh would be at the top of the list), and other languages may be more expressive than English in certain contexts—for example, I find Celtic languages much better for poetry than English, because in the latter the Romance and Germanic elements jar with each other in verse unless the author is very careful*—but I nevertheless maintain that English is probably the all-round most expressive and subtle language in existence. The language capacity in all humans is equal, and languages all have equal potential, and a base line of expressiveness that is the same for all languages (probably the level to which children automatically take pidgins when they make them into creoles), but nonetheless it is possible that because of various external circumstances, languages may not actually be equal in all respects.

 (*For example, I love John Donne but still think he would sound much better in Welsh or German, or even French.)

 Anyway, to get back to my L2 English-speaking friends. One of them is currently reading The Lord of the Rings (a wonderful book for demonstrating one facet of the greatness of English—Tolkien has a very good grasp of the beauty, subtlety and simplicity of earthy native and low-register or register-neutral English words and expressions. His language is generally not consciously archaic, but neither is it modern: it is timeless, as if the author was trying to capture an element of the genius of English which runs all the way through it synchronically and diachronically). My friend says he learnt the expression ‘make off’ (as in ‘depart’) from LOTR.

 This made me think about this expression and related ones. It is a phrasal verb, another thing that strongly characterizes English (and also German and Gaelic, but not Romance). There are many different types of phrasal verbs in English, including ones formed with prepositions and adverbs, and ones that are separable and inseparable. Incidentally, I have a book called The Oxford Dictionary of English Phrasal Verbs, which classifies and lists thousands of such verbs. I found this tome in a small bookshop in another town where it had been sitting so long on a shelf near the sunny window that the red on its spine had been bleached to a sickly pink. After eyeing the book on several visits over several months, I finally could not resist any longer and purchased it. One of the stated aims of the book is to be a guide to L2 learners, since phrasal verbs are one of the trickiest parts of English for foreigners.

‘Make off’ is not very frequent, and has a slightly jaunty air, and may be a tad archaic; though it is often used to mean ‘steal’, as in ‘He made off with the cutlery’. More commonly, one says ‘set off’, or ‘set out’, which are more or less synonymous except that ‘set out’ is more purposeful, and can be used in a non-literal sense: ‘He set out to kill his wife’ can mean either literally that he moved from one place to another in order to kill her, or else figuratively that he had the intention to kill her and began planning how to do it, whereas ‘he set off to kill his wife’ can only have the first meaning. Also note that ‘set about’ means almost the same as ‘set out’ in this non-literal sense, but they are syntactically different in the complement they take: one can say ‘he set about ruining my life’, but not ‘*he set about to ruin my life’; and one can say ‘he set out to ruin my life’ but not ‘*he set out ruining my life’. No wonder it’s confusing!

Much, much more could be said on this topic, but this is enough to show how complicated English is (and all other languages), and what a chore it is to learn all the subtleties. At the end of the day all you can do is keep an ear out for new things, and read, read, read a wide range of text types and look things up in dictionaries, no matter how dull or tiresome that is. However you do it, reading The Lord of the Rings is no bad place to start. I must remember to start reading The Hobbit in German again…