Scholarly article on topic 'Use of Comparable Corpus in Teaching Translation'

Use of Comparable Corpus in Teaching Translation Academic research paper on "Languages and literature"

Share paper
OECD Field of science
{"Translation studies" / "Malay-English translation" / "comparable corpus ;"}

Abstract of research paper on Languages and literature, author of scientific article — Norsimah Mat Awal, Intan Safinaz Zainuddin, Imran Ho-Abdullah

Abstract Comparable corpus is defined as a collection of texts in one language together with texts translated into the same language. Comparable corpus presents the opportunity to discover features that occur more frequently in translated texts or ‘translation universals’. In the Malaysian context, several studies have attempted to look at these translation universals or looking at patterns in Malay translation corpus and compared them to a corpus of original (non-translated) Malay. This paper focuses on the linguistic nature of Malay preposition ‘untuk’. A linguistic analysis has revealed divergent usage patterns of ‘untuk’ between Malay translation and Malay original text. Such finding is important for it contributes to the teaching and training of translators.

Academic research paper on topic "Use of Comparable Corpus in Teaching Translation"

Available online at —

\ ScienceDirect Procedía

Social and Behavioral Sciences

ELSEVIER Procedía Social and Behavioral Sciences 18 (2011) 638-642

Kongres Pengajaran dan Pembelajaran UKM, 2010

Use of Comparable Corpus in Teaching Translation

Norsimah Mat Awal*, Intan Safinaz Zainuddin & Imran Ho-Abdullah

Faculty of Social Sciences & Humanities, Universiti Kebangsaan Malaysia, 43600 UKMBangi, Malaysia


Comparable corpus is defined as a collection of texts in one language together with texts translated into the same language. Comparable corpus presents the opportunity to discover features that occur more frequently in translated texts or 'translation universals'. In the Malaysian context, several studies have attempted to look at these translation universals or looking at patterns in Malay translation corpus and compared them to a corpus of original (non-translated) Malay. This paper focuses on the linguistic nature of Malay preposition 'untuk\ A linguistic analysis has revealed divergent usage patterns of 'untuk between Malay translation and Malay original text. Such finding is important for it contributes to the teaching and training of translators.

© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Kongres Pengajaran & Pembelajaran UKM, 2010

Keywords: Translation studies; Malay-English translation; comparable corpus;

1. Introduction

In translation classes, students are often reminded to strive to remain 'faithful' in their translation practices. 'Faithful' in translations entails that a translator will try his best to transfer the message in the source text into the target language. A translation that is deemed 'faithful' is a translation that is able to convey the same meaning or equivalent meaning in the target text. However, the concept of equivalence in translation is an issue that is often debated and contested. Munday (2008) gives an overview on the different perspectives on equivalence; from Jacobson's (2000) equivalence in meaning to Nida's (2000) formal and dynamic equivalence to Newmarks's (1981) notion of equivalence in his semantic and communicative translation. Munday also notes that Baker (1992) structures her book on around different kinds of equivalence - at the word level, above the word level, text level and pragmatics level.

According to Baker (1993) the issue of equivalence has become a dominant issue in translation studies apart from issues relating to the primacy of the source text. With this, translations should strive to be as equivalent to their originals as possible. However, Baker (1993) notes that "from the late seventies onwards, the source-oriented notion of equivalence has been gradually replaced by notions which clearly take the target system and culture as a starting point." (p.239). In particular, Baker cites the concept of norms in translation as introduced by Toury (2000). Norms are a category of descriptive analysis. They are options which are regularly taken up by translators at a given time and in a given socio-cultural situation. Baker considers the

* Corresponding author. Tel.: +603-8921-6562; fax: +603-8925-4577. E-mail address:

1877-0428 © 2011 Published by Elsevier Ltd. doi:10.1016/j.sbspro.2011.05.094

concepts of norms, laws and universals to be related to the concept of typicality, a notion that has emerged from corpus-based translations studies. Some of the typicalities of the language of a corpus of translated texts are: (i) a marked rise in the level of explicitness; (ii) a tendency towards disambiguation and simplification; (iii) a strong preference for conventional 'grammatically'; (iv) a tendency to avoid repetitions which occurred in source text, either by omitting or rewording them; and (v) a general tendency to exaggerate features of the target language.

In this article, we will discuss our findings on the translation universals or typicalities from the corpus of translated texts gathered from our translation students. Specifically, we will focus on one of lexical grammatical items that appear to be the salient linguistic nature; the Malay preposition untuk.

2. Corpus-based Translation Studies

Corpus-based approach in translation studies was suggested as the 'new paradigm in translation studies' by Laviosa (1998a). The approach drew on tools and techniques from corpus linguistics. Laviosa (1998b) presents the core patterns of lexical use in comparable corpus. The comparable corpora that she uses are two collections of narrative prose in English; one is made up of translations from a variety of source languages, the other includes original English texts. Among her significant findings which she terms as 'core patterns of lexical use' are (i) translated texts have relatively lower percentage of content words versus grammatical words and (ii) the proportion of high frequency words versus low frequency words is relatively higher in translated text.

Olohan and Baker (2000) examine the use of relative pronoun that in the Translational English Corpus (TEC) at the University of Manchester and compared its frequency in the reference corpus, the British National Corpus (BNC). They found that in the BNC the relative pronoun tend to be omitted more often when used in conjunction with contractions and that occurs more frequently with contractions in TEC. Corpus-based translation studies also gain prominence in language other than English. Tirkkonen-Condit (2005) examines the usage of particle kin in texts translated into Finnish and texts written in Finnish. Her study of a corpus of Finnish translation texts across 5 genres, found that the frequency and usage of the clitic particle kin in Finnish translation texts was significantly lower (4.6 per 1000 words) compared to the use of the particle in original Finnish (6.1 per 1000 words). She hypothesized that unique elements in language tends to be under represented in translation language.

3. Methodology

The methodology of this study is based on corpus linguistics methodology. The following section will discuss the corpus and data generated.

3.1. Research Design

The design for the study utilised data from the corpus that is generated by WordSmith program. The initial wordlist of the corpus is generated and subsequently all content words are omitted. The resulting wordlist is a list of most frequent lexico-grammatical items. A comparative list of such items was obtained from the UKM-DBP corpus of Malay text.

3.2. The Corpus

The Translation Corpus (TC) used in the present study is a specialized corpus based on the translation works by students in the translation courses taught. The courses are level 1 translation courses at Universiti Kebangsaan Malaysia. All the assignments are English to Malay translations. The students who are enrolled in the translation courses have the necessary proficiency in both source and target language. The TC currently contains 23,516 words and represents translation language. The basic statistics of the TC is as follows:

Table 1. Basic Statistics of Translation Corpus

Total tokens 23,516

Total types 4,384

Type-token Ratio 18.6

For the purpose of comparison, the UKM-DBP 5 million word corpus of Malay was used to provide statistical information on natural occurring Malay. The Keyword computer program now incorporated in the Wordsmith program, available at web site was used to compare frequency lists from the TC corpus and the UKM-DBP corpus. The keyword program allowed us to generate a list of frequent words (salient items) that were more significantly frequent in the TC compared to the UKM-DBP corpus. The salient grammatical items for the TC corpus are listed in the table below. It should be noted that twelve grammatical items are statistically significant in the TC as compared with the UKM-DBP corpus (Table 2).

4. Findings

The data from Table 2 below based on keywords or salient items provide us with a principled approach to deciding which grammatical words to analyse. The pronouns (anda 'you'; saya 'I') are salient and appear more frequently in the translation corpus while and kita 'us' are significantly under represented in the translation corpus. The relative clause marker yang also appears more frequently in the translation corpus. Other significant grammatical items that are salient in the translation texts include the conjunction/ coordinator untuk 'for' and modal auxiliaries such as telah and akan. For the purpose of this paper only one item will be analysed further namely the preposition untuk to illustrate the principle of using corpus data in translation studies. A concordance for untuk is generated using WordSmith Concord which provided the data for a contextual analysis of each grammatical use of untuk.

Table 2. Salient Grammatical words in Translation Corpus

word Frequency (TC) % (TC) Frequency (UKM-DBP) % (UKM-DBP) X2 score P*

anda 380 1.60 5088 0.14 1150.49 0.00

untuk 361 1.52 30946 0.84 105.96 0.00

yang 1026 4.31 121033 3.28 73.25 0.00

telah 133 0.56 8992 0.24 70.68 0.00

tersebut 81 0.34 4477 0.12 62.57 0.00

akan 217 0.91 19512 0.53 54.44 0.00

adalah 134 0.56 10879 0.29 45.78 0.00

saya 151 0.63 13054 0.35 42.92 0.00

jikalau 8 0.03 96 - 25.66 0.00

ada 42 0.18 12958 0.35 -25.13 0.00

kita 48 0.20 16339 0.44 -38.98 0.00

itu 109 0.46 42306 1.14 -127.52 0.00

5. Discussions

One of the most salient grammatical words found in the Malay translation corpus (TC) is the grammatical item untuk. According to Asmah Haji Omar (1993:201) untuk is a conjunction (kata penghubung) that functions to indicate purpose. The item also serves to indicate the beneficiary of an action.

a. Dia memasak makanan itu untuk emak nya He cooked food that for mother (poss) He cooked that food for his mother.

Nik Safiah Karim et al., (1996) record two usages of untuk in Malay; (i) to show the benefit of an item as in (b) ; and (ii) to assert the part that is meant or allocated for as in (c):

b. Rumah untukpekerja-pekerja sedang dibaiki. House for workers (prog)(pass) repaired The house for the workers is being repaired.

c. Hadiah itu untuk mu. Present that for you That present is for you.

An alternative view on the use and meaning of untuk is provided by Maslida Yusof (2008), where the functions of untuk are primarily based on the actionsart. She claims that Malay items such as demi, untuk, bagi indicate intentionality apart from the beneficial function.

In terms of untuk as a salient item in translation language, the present analysis is based on 361 lines of concordance of Malay translation texts that contain the item untuk. Two recurrent patterns emerge from the translation language. The first is untuk + verb, for example 'untuk mengatasi masalah\ This is perceived to be the translation from the structure 'to overcome the problem' or 'infinitive to + verb'. The second pattern that emerge is untuk + noun, for example 'untuk anak-anak that can be perceived as the translation of English 'for the children' or 'for+noun'. The higher frequency of untuk in the translation corpus is due to the higher frequency of 'untuk + verb' in translation as compared to the original Malay text. About fourty percent of the sentences in with untuk in the TC is of the 'untuk + verb' combination such as 'untuk menghasilkan\ 'untuk membuaf, 'untuk mencegah and others.

d. Teknologi yang baru untuk menghasilkan hidrogen.

(New technology to produce hydrogen.)

e. Ramai yang takut untuk membuat keputusan.

(Many are afraid to make decisions.)

f. Ada dua jenis ubat yang digunakan untuk mencegah daripada kanser.

(There are two types of medication used to prevent cancer.)

Example (d) shows how untuk functions to complement the clause by indicating the purposeful activity or the intentionality of the action i.e. to produce hydrogen. Similarly in (e) and (f); the intentionality of the actions are indicated by the untuk phrase. Since this pattern recurs in high frequency only in the translation corpus, we can hypothesize that the pattern appears to be 'typical' in Malay translation language (Hunston, 2002). Since there is a higher frequency of untuk + verb compared to untuk + noun in the TC, our observation here preliminarily suggest that the higher number of untuk in the TC could be partically due to the use of this item to indicate intentionality ( Maslida, 2008). In turn, the higher usage of this item for this function is due to the translation strategy of supplying more verbal or active process information in the translation texts -reflecting Baker's (1993) contention that there is a tendency of rewording in translation and which Toury (1993) views as the most persistent, unbending norms in translation.

Apart from the translation strategy that has been accepted as one of the universal features of translation (Baker, 1993), Ainon and Abdullah (2000) have also discussed the difficulties of translating the English preposition for into Malay. Mistakes occur because untuk is always perceived as the translation for for, as the example below shows:

g. He will do anything for money

Dia sanggup melakukan apa sahaja untuk wang instead of Dia sanggup melakukan apa sahaja kerana wang.

6. Conclusion

This study has attempted to use corpus in translation studies and apply the findings to inform the teaching of translation. In order to do this, we have first compiled a translation corpus. Data of salient differences between translation language and original language is then generated to inform our teachings. In the present case, we have investigated the over presentation of untuk in Malay translation - where the item is seen as a 'convenient' equivalence in the translation of English for. Since for occurs in high frequency in the source language, it serves as a stimulus for the translator to choose untuk as translation target. Our investigation made possible via corpus-based method based on a siginificant amount of translation sentences involving untuk seems to indicate that untuk which is a typicality in translation (language) is a schematic extension of its function in Malay. Since non intentional, namely beneficial uses of untuk are also found in substantial numbers in the TC, our findings also support the translation universals hypothesis which claims that translation language resembles the normative standard language of the original language.

7. Acknowledgement

The authors would like to to express their gratitude to Universiti Kebangsaan Malaysia for sponsoring this study under the project entitled "The use of comparable corpus in teaching translation" (Project code: UKM-PTS-045-2010).


Ainon Mohd & Abdullah Hassan. (2000). Teori dan Teknik Terjemahan. Kuala Lumpur: Persatuan Penterjemah Malaysia. Asmah Haji Omar. (1986). Nahu Melayu Mutakhir: Edisi Baru. Kuala Lumpur: Dewan Bahasa dan Pustaka. Baker, M. (1992). In Other Words: A Course Book on Translation. London: Routledge.

Baker, M. (1993). Corpus linguistics and Translation Studies. Mona Baker, Gill Franciss, Elena Tognini.Bonelli. (eds.) Text and

Technology: In Honour of John Sinclair. Philadelphia: John Benjamins Publishing Company. Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.

Jakobson, R. (2000). On Linguistic Aspects of translation. In Venuti. L. (ed.) The Translation Studies Reader. London: Routledge. Laviosa, S. (1998a). The corpus-based approached: A New Paradigm in Translation Studies. META. 13(4): 474-479. Laviosa, S. (1998b). Core Patterns of Lexical Use in a Comparable Corpus of English. META. XLII(4): 1-15.

Maslida Yusof. (2008). Struktur semantik preposisi 'Bertujuan': Satu analisis berdasarkan korpus. Dlm NorHashimah Jalaluddin, Imran

Ho Abdullah, Idris Aman. (eds.) . Linguistik: Teori dan Aplikasi. Bangi: Penerbit UKM. Munday, J. (2008). Introducing Translation Studies: Theories and Applications (2nd edition). London: Routledge. Newmark, P. (1981). Approaches to Translation. Oxford: Pergamon.

Nida, E. (2000). Principles of Correspondence. In Venuti. L. (ed.) The Translation Studies Reader. London: Routledge.

Nik Safiah Karim, Farid M. Onn, Hashim Hj Musa and Abdul Hamid Mahmood. (1996). Tatabahasa Dewan Edisi Baharu. Kuala

Lumpur: Dewan Bahasa dan Pustaka. Olohan, M & M. Baker. (2000). Reporting that in Translated English: Evidence for Sub-conscious Processes of Explicitation. Across 1: 142-172.

Sinclair, J. (ed.). (1991). Prepositions. London: Harper Collins Publisher.

Tirkkonen-Condit, S. (2005). Do Unique Items Make Themselves Scarce in Translated Finnish? In: Károly, K. & Fóris, Á. (eds.) New

Trends in Translation Studies. In Honour of Kinga Klaudy . Budapest: Akadémiai Kiadó, 177-189. Toury, G. (2000). The Nature and Role of Norms in Translation. In Venuti. L. (ed.) The Translation Studies Reader. London: Routledge.