Nerdy Linguistic Stuff
For the next three months, I will be taking a class online through the Graduate Institute of Applied Linguistics in Dallas TX. Hopefully this will be the second-to-last class I need to take towards my MA in Applied Linguistics, which I hope to finish when we’re back in the US next fall.
The course I’m now taking is an introduction to a computer program called The Bible Translator’s Assistant. TBTA is a “Natural Language Generator” – a computer program that creates a translation of a text into another language, a translation which (hopefully) is accurate to the source text and sounds natural in the target language.
Anyone who has ever used Google Translate knows how bad “machine translations” can be. I have many stories of hapless students who tried to do their Spanish homework for me using an internet translator. The results were usually humorous, often hysterical, and always unfit for publication. Just type a paragraph from a foreign-language novel or storybook into an internet translator and try to make sense of the English “translation” you get. For extra laughs, type a paragraph from an English novel into the computer, translate it to the language of your choice, and then translate that back into English. Or try this paragraph from a Korean children’s story.
So, why do people think that computers can realistically help translate the Bible? Won’t you spend more time cleaning up a poor machine translation than you would just translating manually from scratch? And, I didn’t see Nsenga on the list of options on the internet translator…
Well, there’s a big difference between a Natural Language Generator and a program like Google Translate. Very basically, there are two steps to the translation process: (1) Analyze the source text to determine its meaning, and (2) Reconstruct that meaning using the vocabulary and syntax of the target language. It turns out that computers are pretty good at step two; where they drop the ball is in step one.
The main problem is dealing with source language ambiguities. Is the word “anger” a noun or a verb? Is “read” present tense or past tense? Is the word “may” asking permission, or is it a month of the year? What sense of the word “key” is in focus here, the shiny metal thing or the answers to the test? (This is why a student’s machine-assisted attempt to translate “Can I go to the bathroom?” started with the Spanish word for a tin can…)
A Natural Language Generator like TBTA is a fundamentally different from something like Google Translate because with an NLG, the source text is “pre-analyzed” to remove those ambiguities. Instead of an English Bible translation, TBTA starts with a linguistically-coded semantic representation of the source text, specifically designed to remove ambiguities, so that the computer can clearly apply the “rules” of the target language without danger of mis-understanding the source text.So, TBTA already contains an unambiguous semantic representation of the Biblical text to be translated. A linguist (that’s me) “teaches” the computer the vocabulary and grammatical rules of the target language (in this case, Nsenga). Then the computer applies the rules in a step-by-step way to the semantic representation to produce a translation in the target language.
If the linguist did his job well, the translation is grammatically correct Nsenga, targeted in simple vocabulary at about the 6th grade reading level. Since the computer always does exactly what you tell it to, the Nsenga that is produced has exactly the same meaning as (is “semantically equivalent” to) the semantic representation of the Biblical text that it started with. And, as long as the semantic representations contained in TBTA are accurate, the Nsenga text will have the same meaning as the original. Voila. Machine-assisted translation. Natural Language Generation.
Testing has shown that a well-done TBTA translation can be used as the base text for a mother-tongue translator, who then needs to do only “light editing” to make the text smooth and natural. And, since the semantic representation has already been checked by a consultant, it theoretically obviates the need for thorough exegetical checking of the draft. In ideal circumstances, a translation project assisted by TBTA can move approximately 5 times faster than a traditional project, with much less manpower and at much lower cost.
Extra-nerdy linguistic sidebar:
TBTA is built on the principles of Natural Semantic Metalanguage Theory. This theory postulates that there are a small set of innate concepts that are present in every language, and that every word in every language can be defined using those innate concepts. These innate concepts are called “semantic primitives.”
However, using the very small number of semantic primitives (there are only about 56) would make communication unwieldy and inelegant, and it would be impossible to translate without distorting the message. So, in addition to the semantic primitives, TBTA also uses “semantic molecules,” which are slightly more complex concepts which are still almost-universally expressed by individual lexemes. TBTA has chosen for its basic vocabulary the approximately 3,000 words in Longman’s Defining Vocabulary, which is a carefully-selected list of the words most commonly used when defining other words in the Longman Dictionary of Contemporary English.
The program also makes use of so-called “complex concepts.” These complex concepts are target-language-specific semantic “bundles,” which are like a condensed shorthand way of expressing a series of semantic primitives. It’s sort of like when a person has spent 5 minutes trying to explain something to you and you finally say, “Oh! We call that _____!” For example, in English, “betray” is a complex concept that bundles together something like, “the action of a friend or ally causing a person’s enemies to be able to capture or harm that person.” The computer doesn’t have to use the unwieldy series of simple molecules, but can simply substitute “betray” each time. The complex concepts are manually written into the program in each target language by the linguist as a “rule,” so that, for example, every time the concept “a person who takes care of sheep” is encountered in the semantic representation, “shepherd” is realized on the surface output.
Currently, the big drawback to TBTA is that crafting the semantic representations of the Bible is a time-consuming and difficult task, and in some particular cases (such as the Psalms) might prove to be impossible. Only a handful of books currently have checked semantic representations. Nevertheless, the program shows promise, and testing with a TBTA-produced translation in Korean has shown that it generates text that tests favorably with other traditionally-translated Bible versions (that is, it’s as good as stuff on the shelf right now).
At any rate, taking this class is getting me one step closer to finishing my MA, which is an important professional credential that can help me in work permits, etc. Exploring TBTA is also interesting from the point-of-view of the Nsenga advisor, because once the linguistic “rules” of Nsenga have been encoded into TBTA, this program is another resource that we can use in our translation, especially as we move forward into the Old Testament after furlough. Finally, I will be the first African-based linguist to work with TBTA, and it will be interesting (hopefully interesting enough to provide a topic for my major MA thesis project!) to see how the program deals with the unique exigencies of a Bantu language.
So, wish me luck, and forgive me if I’m quieter than usual for the next three months as I add this new responsibility to my schedule. “May all of your utterances be laced with humorous semantic ambiguities.”