4. Linguistic data in morphosyntax

4.7. Becoming a linguist: Glossing spoken language data

If the object language you are discussing differs from your metalanguage, you should gloss your examples. This means that you provide a morpheme-by-morpheme translation of your data. This is what enables us, as linguists, to analyze the structure of languages we ourselves do not speak.

Normally glosses are provided in a three- or four-line format.

In three-line glosses, the first line provides the object language data. The second line provides a morpheme-by-morpheme translation of the object language into the metalanguage. The third line provides a natural, idiomatic translation into the metalanguage. Gonzalez’s example (4a) from the previous section is repeated below to illustrate.

(1) Finnish
Lähti-kö Mari?
left-PolQP Mari
‘Did Mary leave?’

(Gonzalez 2023: 3)

A 4-line gloss is most often used when the phonological processes in the object language make it difficult to identify the morpheme boundaries. In these cases, the first line would provide the object language without modification and the second line would provide the underlying form of each morpheme. The last two lines are the same as in a three-line gloss.

(2) Blackfoot
Nitsiikaakaahsi’taki aotoyaakihtsiniki
nit- iik- aak- yaahssi -i’taki a- oto- yáakihts -iniki
1- intend- FUT- be.good.VTI -feel.emotion DUR- go.to.do- go.to.bed -SUBJ
‘It will make me happy when you go to bed.’

(Déchaine and Wiltschko 2014: 74)

The first line: The object language data

If the object language of your paper is not written with the Latin alphabet but your metalanguage is, then it is convention to transliterate the object language into the Latin alphabet or transcribe it with the IPA. Many languages have a standardized transliteration system which you can use. For example, Mandarin is often transliterated in the system known as pinyin. If you wish to include the original orthography for any reason, you can use a 4-line gloss with the orthography as the first line and a transliteration or transcription in the second line.

Secondly, you must always mark the sentence for its acceptability. If the sentence has no acceptability mark, it means that it is fully acceptable. The most common acceptability marks are shown in Table 1.

Table 1. Acceptability markings
Name Symbol Meaning Example
Asterisk * ungrammatical *Cat the apple ate.
Question mark ? weakly ungrammatical ?I have dived into the pool.
Percentage sign % variation in acceptability %I eat meat anymore.
Number sign # semantically or pragmatically ill-formed #My toothbrush ate an apple.

The most common acceptability mark is the asterisk (*), which means that it is ungrammatical, or, in other words, morphologically or syntactically ill-formed. It is also sometimes used when it is unclear why a sentence is unacceptable. The question mark (?) is used when a sentence feels weakly ungrammatical. In other words, it feels kind of off, but not fully ungrammatical. The percentage sign (%) is used when there is variation. The example in Table 1 uses positive anymore, which is a grammatical feature of some dialects of US American English and Irish English. Finally, the number sign (#), also sometimes called the hash or pound sign, is used for sentences that are semantically or pragmatically ill-formed. The example in Table 1 is semantically ill-formed because toothbrushes cannot eat.

Sometimes linguists will use multiple asterisks or question marks, or combinations of asterisks and question marks, to indicate degrees of unacceptability.

Sometimes linguists use parentheses to abbreviate multiple examples into one example. If a word is included in the example in parentheses, as in (3a), it means that the word in parentheses is optional. Thus, (3a) means that the sentence is acceptable either without the word quickly, as in (3b), or with the word quickly, as in (3c).

(3) a. Jenna ate the pickles (quickly).
b. Jenna ate the pickles.
c. Jenna ate the pickles quickly.

If a word is included in parentheses with an asterisk also inside the parentheses, as in (4a), that means that the word in parentheses may not be included in that position in the sentence but that the sentence is otherwise grammatical. Thus, the notation in (4a) means that the sentence in (4b), without quickly, is grammatical, but the sentence in (4c), with quickly, is ungrammatical.

(4) a. Jenna ate the (*quickly) pickles.
b. Jenna ate the pickles.
c. *Jenna ate the quickly pickles.

Finally, if a word is in parentheses with an asterisk outside of the parentheses, as in (5a), it means that the word in the parentheses is obligatory. In other words, the notation in (5a) means that a sentence with the word ate, as in (5b), is grammatical, but the same sentence, with ate omitted, as in (5c), is ungrammatical.

(5) a. Jenna *(ate) the pickles.
b. Jenna ate the pickles.
c. *Jenna the pickles.

The second line: Morpheme-by-morpheme glossing

In the second line of a three-line gloss, we include a morpheme-by-morpheme translation. The format for the second-line gloss is highly conventionalized. The most common conventions are summarized by the Leipzig Glossing Rules. If you continue on in linguistics, you should refer to the glossing rules when you read and write papers. This explanation is consistent with the Leipzig glossing rules.

Each word in the first line should be aligned with their translation in the second line. When they are not lined up, the examples are much harder to read, as shown by the difference between (6a) and (6b) below. You can line them up manually using spaces and tabs or you can use an invisible table, with each word in a different column.

(6) a. Hindi-Urdu
Ṭi:car=ne Anu=se pu:ch-a: [ki kya: vo ca:i piyegi:].
teacher=ERG Anu=from ask-PFV that KYA s/he tea drink.FUT.3FSG
‘The teacher asked Anu whether she would drink tea.’
b. Ṭi:car=ne Anu=se pu:ch-a: [ki kya: vo ca:i piyegi:].
teacher=ERG Anu=from ask-PFV that KYA s/he tea drink.FUT.3FSG
‘The teacher asked Anu whether she would drink tea.’

(Gonzalez 2023: 8)

In the second line, we put content words in lowercase letters, and we put grammatical markers in small caps. If small caps cannot be used for some reason, uppercase letters can be used instead.

Morphemes are separated out by hyphens (-) in both the first and second lines. Other markings may be optionally used for special kinds of morphemes (=, ~). If a morpheme cannot be translated by a single word, use periods (.) to separate the pieces of the gloss that all translate parts of the same morpheme.

Authors often use abbreviations for the grammatical markings in the gloss. The Leipzig glossing rules include a list of common abbreviations. If you use any abbreviations that differ from Lepizig’s, you need to include a key or glossary of your abbreviations somewhere in your paper. Some common places authors put their abbreviation list include a footnote at the beginning of the paper, a footnote on the first example in the paper, a note at the end of the paper, or an appendix in a book. All of the abbreviations used in this textbook are included in a back matter section.

The third line: A natural translation

The third line of a gloss should provide a natural translation into the metalanguage. The third line tells you what the example means.

Sometimes an author will also include a literal translation with the abbreviation “lit.”.

If an example is ungrammatical, it might not really have a meaning. In this case, an author might indicate that the meaning is “intended”.

Finally, if you have multiple examples in a row that all mean the same thing, sometimes the author will only include the translation once, after either the first or last example.

Check yourself!

References and further resources

Reference materials

📑 Max Planck Institute for Evolutionary Anthropology Department of Linguistics. 2015. Leipzig Glossing Rules. https://www.eva.mpg.de/lingua/resources/glossing-rules.php

Sources for examples

Déchaine, Rose-Marie and Martina Wiltschko. 2014. Micro-variation in agreement, clause-typing and finiteness: Comparative evidence from Blackfoot and Plains Cree. In J. Randolph Valentine and Monica Macaulay, eds. Papers of the 42nd Algonquian Conference. SUNY Press. 69101.

Gonzalez, Aurore. 2023. Interrogative particles in polar questions: The view from Finnish and Turkish. Glossa 8(1): 1–47.

definition

License

Share This Book