Scholarly article on topic 'A corpus-based analysis of relative clause extraposition in Persian'

A corpus-based analysis of relative clause extraposition in Persian Academic research paper on "Languages and literature"

Share paper
Academic journal
OECD Field of science
{"Relative clause extraposition" / "Grammatical weight" / "Information status" / "Word order" / Persian}

Abstract of research paper on Languages and literature, author of scientific article — Mohammad Rasekh-Mahand, Mojtaba Alizadeh-Sahraie, Raheleh Izadifar

Abstract In recent functional and cognitive literature different motivations are suggested to influence the relative clause extraposition, where the modifying relative clause is not adjacent to the modified head noun. Information status, grammatical weight and verb class are among such motivations. The current corpus-based study of relative clause extraposition attempts to test the prediction of these different motivations in Persian. Using logistic regression analysis, the effects of these various factors on the extraposition of relative clauses are investigated. The findings revealed that, among different influential sources, grammatical weight is the main factor influencing extraposition of relative clauses. Verb class and information status are found to be lower ranked factors, respectively. The analyses demonstrated that with a special verb class, i.e. linking verbs, which predominantly carries given information in discourse, relative clause extraposition happens more freely. The findings support Hawkins' (2004) principle of domain minimization and provide more evidence for the hypothesis that, Persian, a seemingly SOV language, behaves typologically as a VO language, in which the heavy constituents shift rightward to facilitate constituent recognition, similar to other head-initial languages.

Academic research paper on topic "A corpus-based analysis of relative clause extraposition in Persian"


Contents lists available at ScienceDirect


journal homepage:

A corpus-based analysis of relative clause extraposition in Persian

Mohammad Rasekh-Mahand*, Mojtaba Alizadeh-Sahraie, Raheleh Izadifar

Department of Linguistics, Faculty of Literature, Bu-Ali Sina University, Hamedan, ¡RAN


• We studied relative clause extraposition in a less studied language.

• We have shown the role of different factors in relative clause extraposition.

• The findings are important for typology of SOV languages.


In recent functional and cognitive literature different motivations are suggested to influence the relative clause extraposition, where the modifying relative clause is not adjacent to the modified head noun. Information status, grammatical weight and verb class are among such motivations. The current corpus-based study of relative clause extraposition attempts to test the prediction of these different motivations in Persian. Using logistic regression analysis, the effects of these various factors on the extraposition of relative clauses are investigated. The findings revealed that, among different influential sources, grammatical weight is the main factor influencing extraposition of relative clauses. Verb class and information status are found to be lower ranked factors, respectively. The analyses demonstrated that with a special verb class, i.e. linking verbs, which predominantly carries given information in discourse, relative clause extraposition happens more freely. The findings support Hawkins' (2004) principle of domain minimization and provide more evidence for the hypothesis that, Persian, a seemingly SOV language, behaves typologically as a VO language, in which the heavy constituents shift rightward to facilitate constituent recognition, similar to other head-initial languages.

© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND

license (


Article history: Received 20 May 2014 Received in revised form 23 February 2016 Accepted 25 February 2016 Available online 3 March 2016


Relative clause extraposition Grammatical weight Information status Word order Persian

1. Introduction

Persian can make relative clauses (RC) from different positions; subject, object, and genitive (Taghavipour, 2005). This language uses gaps obligatorily in subject relative clauses, gaps or resumptive pronouns, interchangeably, in direct object relative clauses, and resumptive pronouns obligatorily in other positions:

Persian RCs are typically introduced by the relativizer ke (that), and this language is also among languages which formally marks the difference between restrictive and non-restrictive RCs. The suffix —i is required on the head of a restrictive relative clause, but

* Corresponding author. E-mail addresses: (M. Rasekh-Mahand), (M. Alizadeh-Sahraie), (R. Izadifar).

not on the head of a non-restrictive one (Comrie 1989:139).

Another feature of Persian RCs is that they could be optionally extraposed from their canonical head-adjacent position to a rightward position after the verb. For example, in (1.b.), the RC, (ke yek kife bozorg be dast-as bud; who had a big bag in his hand) is detached from its head-adjacent position in (1.a.) and is moved to a post-verbal position. The extraposed and non-extraposed sentences express the same proposition.

Different formal and functional explanations are provided to account for the relative clause extraposition (RCE) in different languages. Formalists have explained RCE in different ways. Some regarded it as rightward movement (Baltin, 1981; Ross, 1967), some considered it as discontinuous constituency of the NP (McCawley, 1987, 1998), some others as leftward movement of non-extraposed elements (Kayne, 1994), still others as base-generated (Rochemont and Culicover, 1990), and finally the last group viewed it as simple adjunction (Culicover Peter and Jackendoff, 2005). It is apparent that they were not able to provide a satisfactory explanation for the problem (Baltin, 2006). On the other hand,

2215-0390/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (

1. a. marde mosenni [ ke yek kife bozorg be dast=as bud] varedsod. Man old COMP a bag big inhand=his was arrive become. 'An old man who had a big bag in his hand arrived.'

b. marde mosenni vared sod [ke yek kife bozorg be dast=as bud], Man old arrive become COMP a bag big in hand=his was. 'An old man arrived who had a big bag in his hand.'

Functionalists tried to explain RCE, too. The main explanations they have provided are based on discourse information (Takami, 1999; Kuno and Takami, 2004; Huck and Na, 1990), grammatical weight (Arnold et al., 2000; Francis, 2010; Francis and Michaelis, 2014; Wasow, 2002; Hawkins, 1999; Yamashita and Chang, 2001) and predicate types (Rochemont and Culicover, 1990). Strunk (2010, 2014) discussed six factors relevant to RCE in German, concluding that the distance of extraposition and the length of relative clauses are the most important factors in RCE.

The present study seeks two aims. Since most of the explanations provided in literature for RCE are based on data from well-studied languages, mainly English; as the first goal, following functional views, it is tried to find out which functional motivation^) plays the most significant role in Persian RCE, and which explanation is supported based on the findings.

The second goal is a typological one. Persian exhibits hybrid features regarding word order correlations (Dryer, 1992); while it is an SOV language in surface sentence form, most of its other features are similar to SVO languages. For instance, it has prepositions, and Noun-Genetive order. It is argued that this language is moving from an OV type language to a VO type one (Dabir-Moghaddam, 2001, 2006). Following Dryer (1992), Dabir-Moghaddam (2001) studied word order correlations in Persian and some other Iranian languages. After examining nineteen of Dryer's correlations which are applicable to Persian, he concludes that "in about two third of the correlation pairs Modern Persian is compatible with strong VO type, both in its own area and globally. This finding is particularly interesting because Modern Persian in terms of the linear order of its constituents is predominantly SOV, both in its written and spoken varieties" (Dabir-Moghaddam, 2001, p. 21). Based on these observations, he hypothesized that Persian is in the process of a syntactic change from an OV type to a VO type language. In order to further examine this syntactic change, the present study aims to find out if Persian, concerning RC extraposition, is acting like an OV type language (Japanese for example) putting the RCs in a leftward position, or like a VO type one (English as an example), and puts the RCs in a rightward position, based on the phrases weight. In English-like languages, the heavy constituents are preferred after short ones, while the situation in Japanese-like languages is reversed (Hawkins, 2004, p. 109).

This paper is organized as follows: after this introduction, in section (2), a brief review of related literature, especially functional studies, is provided to enumerate the different motivations

suggested for RCE. In section (3), the corpus analysis and its findings are provided to answer the first question. Section (4) tries to give an answer to the second question, focusing on typological implications of this study. Section (5) is the conclusion.

2. Review of literature

Apart from formalists who tried to explain RCE, two main positions are suggested in functionalist literature for this phenomenon which are known as "newness (and importance of information)" (Huck and Na, 1990,1992; Kuno and Takami, 2004) and "grammatical weight" (Arnold et al., 2000, 2004; Hawkins, 1994, 2004; Wasow, 1997a; Yamashita and Chang, 2001 ; Francis, 2010).

Regarding the first position, scholars have argued that in RCE, the relative clause carries newer or more important information, compared with the VP (Huck and Na, 1990; Kuno and Takami, 2004; Takami, 1999). For example, Kuno and Takami (2004, p.186) introduce a discourse constraint on extraposition from subject NPs, according to which the "extraposition from subject NPs is allowed only if the predicate that the extraposed constituent crosses over, represents anaphorically or deictically grounded information, and if the extraposed constituent represents the most important information in the sentence". In example (2) (from Kuno and Takami, 2004, p.185), the extraposition of a relative clause is acceptable because the predicate of this sentence i.e., did, is anaphorically grounded and the extraposed relative clause carries new information.

2. Speaker A: Who gave you this book?

Speaker B: A stranger did who said he had come from


Takami (1999:27) arrives at a similar explanation and states that "Extraposition from NP is allowed only when the element extrap-osed to sentence-final position is interpreted as being more important than the rest of the sentence". Arnold et al. (2000:30) also argue that despite different terms and applications of old versus new information, following Clark and Clark (1977:548), it could be generalized that "Given information should appear before new information", stated as Given-before-New principle by Gundel (1988:229). Francis (2010:38) observes that RCE is in line with the general tendency among languages to put the focused constituents later than the old information in the sentence.

The second position relies on weight (interchangeably called length or heaviness) of different constituents and its relation with order of constituents. 'End-weight' is a term, first introduced by Quirk et al. (1972), to refer to an old observation that heavy elements tend to come late in the sentence (Behaghel, 1930).

Heaviness could be simply defined as the number of words in a constituent, or it could be measured as "the difference in length between two constituents, in terms of number of words" (Arnold et al., 2000:29). It is argued that heavier constituents are longer or structurally more complex than lighter constructions (Francis, 2010:38). Some of the scholars, like Arnold et al. (2000) argue that variations in post-verbal constituent ordering in English could be attributed to both grammatical complexity (heaviness) and information status (newness). Wasow (1997a:83) gives examples of 'weight-sensitive' constructions in English; like heavy-NP-shift, particle movement and dative movement, in which "two alternative orderings of post-verbal constituents are possible, and the choice of which ordering to use appears to be determined at least in part by the length and complexity of the phrases in question". He also mentions that the relative weights of the constituents in question are important, not just the absolute weight of one constituent. Heavy NP Shift is the most studied phenomenon in which the role of grammatical weight is emphasized. In this phenomenon, the direct object of a transitive verb occurs at the end of the sentence following an oblique argument or an adjunct. Different studies demonstrate that Heavy NP Shift normally occurs when the NP is longer than the PP (as in 3.b.), and if the NP is light, the outcome is not acceptable (3.c.) (Francis, 2010:39):

3. a. The waiter brought the wine we had ordered to the table.


b. The waiter brought to the table the wine we had ordered.

(Heavy NP-Shift)

c. ? The waiter brought to the table the wine.

By observing different weight-sensitive phenomena, Wasow (2002:3) reformulates Behaghel (1909) and Quirk et al. (1972) earlier principles as the Principle of End Weight:

4. Principle of End Weight (PEW): Phrases are presented in the

order of increasing weight.

It is generally argued that placing longer constituents at the end of sentences helps both speakers and listeners: speakers gain more time to formulate the heavier constituents (Wasow, 1997b; Arnold et al., 2000), and listeners are able to process the sentences more efficiently since they recognize the major constituents of the sentences faster (Hawkins, 1994,2004). Stallings et al. (1998) proposed that speakers prefer short constituents before longer ones. Lohse et al. (2004) showed that sentences with verb and particle adjacent are more common in corpora, especially when the direct object NP was long. Gonnerman and Hayes (2005) report the same findings regarding verb particle adjacency and direct object NP length.

Wasow (2002:7) talks about the probable role of (PEW) in extraposition from NP and argues that "both the final position of the usually heavy extraposed element and the lightening of the NP serve to increase the probability of satisfying PEW." Francis (2010), following Wasow's suggestion, conducted an empirical study which was mainly concerned with the possible role of grammatical weight in licensing RCE in English and its role in the processing, acceptability, and usage of RCE. She carried out two psycholin-guistic experiments and a corpus study to determine the weight effects in RCE. She reports that the readers experienced a significant reading time advantage for RCE when the RC was heavy, but when

the RC was light, there was no difference found between RCE and canonical structure. In the second experiment which was an acceptability judgment task, the participants judged canonical sentences as significantly more acceptable than RCE sentences when the relative clause was light, but when the relative clause was heavy, this difference disappeared. In her corpus study, she calculated the relative weight between relative clauses and VPs, and found that on average, extraposed RCs are longer than the VPs, while non-extraposed ones are shorter. She found that as the ratio of VP length to RC length increased, the proportion of sentences with RCE decreased.

There seems to be a paradox in RCE studies. In one hand, the discontinuous dependency in RCE complicates the syntax and presumably increases processing complexity (Wasow, 2002: 7); on the other hand, Francis (2010) shows that moving heavy constituents to the end in RCE can facilitate processing. Francis (2010: 40) argues that these findings are not paradoxical and they "support Hawkins (2004) theory and help explain why RCE is preferred in some contexts despite the discontinuous dependency." Hawkin (1994, 2004) performance-based theory of constituent order deals with the effects of different factors, among them grammatical weight, in sentence processing and predicts that in relative clauses, processing efficiency increases by putting the heavier constituent at the end. One of the major claims in this theory is the "principle of Minimize Domain", as defined in (5):

5. Minimize Domains: The human processor prefers to minimize the connected sequences of linguistic forms and their conventionally associated syntactic and semantic properties in which relations of combination and/or dependency are processed. The degree of this preference is proportional to the number of relations whose domains can be minimized in competing sequences or structures, and to the extent of the minimization difference in each domain (Hawkins, 2004: 104).

Based on the principle of minimize domains, rearrangement of heavy constituents is preferred to minimize the domain and make the processing easier. Some linear orders, e.g. RCE, reduce the number of words and their combinations needed to be processed, and hence make the phrase structure processing faster. Accordingly "the principle of minimize domains defines a preference for the most minimal surface structure domains sufficient for the processing of each combinatorial and dependency relation" (Hawkins, 2004:32). Domains are calculated in terms of immediate constituents to word ratios (Hawkins, 1994). "A Phrasal Combination Domain (PCD) is, in effect, the smallest domain that permits a phrasal node to be constructed along with its daughter immediate constituents" (Hawkins, 2004:107).

Some studies of RCE from object position in German, Uszkoreit et al. (1998), Konieczny (2000), and Strunk (2010,2014) found some evidence in support of Hawkins' predictions. Francis (2010) studied RCE from subject position in English and hypothesized that "extraposition should be preferred in language use in cases where relative clause length exceeds VP length, and this preference should be stronger as the difference in length becomes greater" (Francis, 2010:47). Her corpus analysis, and two experiments (reading time and acceptability judgment task) showed that grammatical weight is an important factor in RCE and it "provides evidence that the increased incidence of RCE with heavy relative clauses is related to processing efficiency" (Francis, 2010:68), that is compatible with the Hawkins (2004) theory of domain minimization.

Hawkins (2004) predicts that in head-final languages such as Japanese and Turkish, heavy constituents should be shifted frontward to facilitate constituent recognition. He asserts that "performance data from Japanese reveal an equally principled set of

preferences, but for the mirror-image long-before-short pattern" (Hawkins, 2004:108). He argues that while "postposing a heavy NP or PP to the right in English shortens PCDs and increases IC-to-word ratios, preposing heavy constituents in a head-final language has the same effect, since the relevant constructing categories (V for VP, P for PP, etc) are now on the right" (Ibid). This prediction is evaluated experimentally in Japanese by Yamashita and Chang (2001). They used a sentence recognition task in which they asked participants to construct a sentence using sentence components presented on a screen in any order. They demonstrated that in Japanese, an SOV language which allows scrambling, long phrases are preferred to appear before short ones. In other words, contrary to English, an SVO language, which shows a "short-before-long" preference, Japanese demonstrates a "long-before-short" tendency. They also reported that Japanese corpus studies affirm the tendency for heavy phrases to come before short ones via scrambling (Yamashita and Chang, 2006). Chang (2009) also argues that heavy NP shift goes in different directions in English and Japanese, and Japanese speakers prefer to place heavy noun phrases earlier in the sentences.

Such different experimental and corpus studies have emphasized that both speaker and listener benefit from minimization of syntactic domains. They support Hawkins' theory that heaviness and order are important factors in sentence complexity and processing efficiency (Gonnerman, 2012). Following Francis (2010) and Yamashita and Chang (2001), in this paper it is tried to test the Hawkins (2004) theory prediction in Persian. As discussed in the introduction, Persian has specific typological features which are different from English and Japanese, and studying its structural alternations can help gain a better understanding of universal aspects of processing typology. While it is an SOV language in surface forms, most of its other typological features correlate with VO languages. In addition, it is a pro-drop language and it allows scrambling (Karimi, 2005). Through a corpus analysis (section 3), it is tried to find out which functional motivation(s) i.e., information status, weight or any other factor, plays a role in Persian RCE. The findings are interpreted to decide on the word order typology of Persian, showing if it is more in line with OV or VO patterns. This might help broaden the view on the relation between efficiency of processing and constituent order and provide more cross-linguistic evidence for Hawkins (2004) theory.

3. Corpus analysis

This work relies on a corpus analysis, based on written and spoken texts, from Persian Linguistic Database (PLDB)1 which is the first on-line database for the contemporary (Modern) Persian and contains selected corpora of all varieties of the Modern Persian language in the form of running texts. Some of the texts are annotated with grammatical, pronunciation and lemmatization tags. The corpus analyzed for the study purpose includes three hundred thousand words, in which 757 relative clauses were found, of which 167 were extraposed from their canonical posi-tion.2 It means that about 22 percent of sample relative clauses were extraposed, it apparently means that RCE is not a rare phenomenon in Persian. As expected, following Keenan and Comrie (1977) Accessibility Hierarchy, relativization from higher positions (when the syntactic function relativized within the relative clause is subject or direct object) is more frequent. Based on the data, 78 percent of relative clauses are from subject position, 12


2 In figures and tables of this paper, the following abbreviations are used: RCC: relative clause in canonical position, RCE: relative clause in Extraposed position, RCL: relative clause length, VPL: verb phrase length.

percent from direct object, 8 percent from genitive, and only 4 percent from oblique position.

After collecting the RCs, for each case, the following categories were coded manually: Extraposition status (RCE or RCC), VP length (based on the number of words), RC length (based on the number of words), predicate type (whether they are event, state, passive, linking, unergative, or accusative verbs), head noun information status (new, accessible, or given), and main predicate information status (new, accessible, or given). The coding was done by three persons separately, and the controversial items were discussed by all and the disputed (56) cases were excluded. The VP and RC lengths were measured in words and VP-to-RC-ratio was gained by VP length divided by RC length (we didn't consider the cases in which RCs were inside the VPs). Each sentence was coded for the predicate type. To code the information status of head noun and main predicate, the researchers followed Michaelis and Francis (2007) and Gregory and Michaelis (2001): given (prior mention of the referent in preceding 20 lines), accessible (mention of the category including the referent in preceding 20 lines (in written form) and new (no mention). In the following subsections, three factors which are supposed to affect the RCE are investigated: weight, information status and verb types.

3.1. Weight

In line with Minimize Domains principle (7), the main prediction is that relative clause extraposition should be preferred whenever the RC length exceeds VP length. Since extraposition minimizes the length of domain which is needed to be processed, the prediction is that in RCE the RC should be longer on average than the VP. On the other hand, the RC should be shorter than VP in canonical structures. Out of 757 relative clauses collected for this study, 590 were used in canonical position and 167 in extraposed position. Table 1 shows the mean number of words used in RC and VP in RCC. The paired samples t-test shows that there was not a significant difference in the length of RC/VP in RCC; t(589) = 1.549, p = 0.122. The means are quite close to each other.

Table 2 shows the mean number of words used in RC and VP in RCE. The paired sample t-test shows that there was a significant difference in the length of RC/VP in RCE; t(166) = 14.061, p < 0.001. The means are far from each other. These results suggest that length really does have an effect on extraposition.

These findings support the hypothesis that grammatical weight is a determining factor in RCE. In RCE, the RC is relatively longer than VP; while their lengths are not very different in RCC. It is the weight of the VP (or extraposition distance), not the sheer weight of the RC that determines the likelihood of extraposition. This means that the researchers of the present study did not consider any RCE from direct object position in the sample and their conclusions only concern RCE from subject position in Persian. As seen in Tables 1 and 2, the mean length of RCs in RCC and RCE are very near to each other (7.21 and 7.71, respectively), but the mean length of VPs are far from each other (6.67 in RCC and 1.97 in RCE). So, what matters in extraposition, is the extraposition distance (the VP length) which is very short in RCE compared to RCC. It could be concluded that, grammatical weight plays an important role in RCE in Persian, supporting the findings of Francis (2010:64) for English.

Table 1

The mean number of words in RC and VP in RCC.

Mean N Std. Deviation Std. Error mean

Number of words in RC 7.21 590 5.258 .216

Number of words in VP 6.67 590 6.973 .287

Table 2

The mean number of words in RC and VP in RCE.


Mean N Std. Deviation Std. Error mean

Number of words in RC 7.71 167 4.282 .331

Number of words in VP 1.97 167 3.051 .236

The results of the weight analysis show that RCE helps to minimize domains, as Hawkins (2004) predicted. The sentences in (6) show the effect of RCE on minimizing domains.

Sentence (6a) is an extraposed sentence which naturally appeared in the corpus. As observed, the VP length in this sentence is just one word. The PCD (Hawkins, 2004:107) for this sentence is 5 words ('in moskele bozorgi 'ast ke): it means after processing these five words, the hearer constructs the mother node (S), and he has access to all its immediate constituents; while in the canonical version of this sentence (6b), the hearer needs to recognize 17 words (the whole sentence) to construct the mother node and its immediate constituents. For (6a), the PCD (the ratio of immediate constituents per words) is 3/5, while for (6b) it is 3/17. It means

processing is much easier in the extraposed one, since the RC is longer than the VP, and extraposition helps to minimize the domains.

The analysis of the corpus shows that RCE is more probable in those sentences when the RC length was about four times longer than VP (as shown in Fig. 2).

Fig. 2 shows that as the ratio of VP length to RC length increased, the extraposition decreased. When the relative clauses were four times longer than VP, more than 45 percent of them were extrap-osed, while it decreased as the length of RC and VP got nearer to each other. So it could be concluded that, the relative length of RC and VP is an important factor in RCE in Persian. The next section explores the role of information status in RCE.

3.2. Information status

To determine the role of information status, the head nouns and the main verbs of matrix clauses were coded according to their information status, as given, accessible, or new (Michaelis and Francis, 2007). The two main predictions regarding the information status are: (a) the percentage of given predicates should be higher in RCE than it is in RCC, (b) the new head nouns in RCE

ke bâ cand sad milvun rival hazine niz qâbele iobrân naxâhad bud

that with some hundred million Rials budget also possible remedy won't be

'in moskele bozorpi 'ast This problem big is

This is such a big problem that will not be solved with hundred million Rials'.

'in moskele bozorgi This problem big

kebâ cand sad milvun rival hazine niz qâbele iobrân naxâhad bud

that with some hundred million Rials budget also possible remedy won't be

This is such a big problem that will not be solved with hundred million Rials'.

<0.25 0.25-0.5 0.5-0.75 0.75-1 >1

Relative weight range Fig. 2. Percentage of extraposed RCs by ratio of VP length to RC length.

Table 3

Information status of predicate in RCE and RCC.


N. % in RCE % in 2 structures N. % in RCC % in 2 structures

New 83 %49.5 20.5 320 %54.5 79.5

Accessible 15 %9 18.7 65 %11 81.3

Given 69 %41.5 25.2 205 %34.5 74.8

Total 167 %100 590 %100

Table 4

Information status of head nouns in RCE and RCC.


No. % in RCE % in 2 structures No. % in RCC % in 2 structures

New 106 63.5 22.1 373 63 77.9

Accessible 35 21 32.4 73 12.5 67.6

Given 26 15.5 15.3 144 24.5 84.7

Total 167 100 590 100

Table 5

Verb classes in RCE and RCC.

Verb class RCE RCC

No. % in RCE % in 2 structures No. % in RCC % in 2 structures

Unergatives 3 1.7 3 104 17.6 97

Unaccusatives 17 10.1 24.3 53 8.9 75.7

State verbs 18 10.7 28.6 45 7.6 71.4

Event verbs 23 13.7 10.2 203 34.4 89.8

Linking verbs 100 59.8 43.6 129 21.8 56.4

Passives 6 3.5 9 59 10 91

Total 167 100 590 100

Table 6

In/Definitene heads in RCE and RCC.


No. % in RCE % in 2 structures No. % in RCC % in 2 structures

Definite 78 46.7 18.4 346 58.6 81.6

Indefinite 89 53.3 26.7 244 41.4 73.3

Total 167 100 590 100

Table 7

Classification Tablea.

Observed Predicted

Stype Percentage correct

Canonical Extraposed

Stype Canonical 527 23 95.8

Extraposed 82 85 50.9

Overall Percentage 85.4

Stype = sentence type. a The cut value is .500.

outnumber the new head nouns in RCC (Rochemont and Culicover, 1990). The Tables 3 and 4 show the information status of predicates and head nouns.

As it is shown in Table 3, 50.5 percent of the predicates in RCE have old information (the sum of given and accessible information), while it is %45.5 in RCC. So the prediction that in RCE, the given predicates appear more often than RCC is verified, though this %5 difference is not enough to give the main role to information status. There was not a significant difference in information status of predicates in RCC versus RCE (p = 0.468) (See Table 7).

The data in Table 4 also show that the numbers of new head nouns in both kinds of relative clauses are very near (%63.5 in RCE and %63 in RCC). Hence, there is not enough evidence to support the prediction that in RCE, the new head nouns appear more often than in RCC will.

These findings may reveal that information status is a factor influencing RCE, though not as effective as weight. The effects of different factors on RCE and their ranks using logistic regression analysis are compared in section (3—4).

3.3. Verb classes

To test whether the verb types play any role in RCE, the corpus was analyzed based on the different verb classes. The verb classes examined included: Transitive event verbs, transitive state verbs, unergatives, unaccusatives, linking verbs and passives as Francis (2010) asserts that RCE appears with a higher frequency in unac-cusative/passive predicates. Table 5 shows the frequency of

different verb classes in RCE and RCC.

The data presented in this table show that in RCC, most of the verbs are of transitive verb types (event and state verbs), and the linking verbs are at the second position; while in RCE, this order is reversed: linking verbs are more than the transitive verbs. Actually, in RCE the linking verbs are about three times more than RCC, %59.8 and %21.8, respectively. In all verb classes, except linking verbs, the number of RCCs is much more than RCEs. For example comparing unergative and linking verbs, it is shown (in Table 5, columns 4 and 7) that most of the unergative verbs are in RCC (%97), and a small number appear in RCE (%3). While for linking verbs, those in RCE amount to %43.6, and those in RCC come up to %56.4. So, concerning extraposition, among the different verb classes, only linking verbs behave differently.

The next question which comes to mind is why linking verbs are frequent in RCE. The answer lies in the interaction of information status and verb class. The data show that linking verbs are different from all other verb classes in that most of the linking verbs carry old information. Among RCEs, %59.8 have linking verbs, and among these sentences, %65; i.e. old linking verbs (%38.92)/old + new linking verbs (%38.92+%20.96) have old information, too. Thus most of the linking verbs have old information. This is shown in Fig. 3:

Fig. 4 shows the interaction of information status and verb class in RCE and RCC. The interaction of information status (new or old) and the verb classes are presented. As it could be observed, the behavior of linking verbs is different from all the other verb classes.

While in all of the other verb classes, the dominant information status is new, the only verb class with dominant given information status is linking verbs, which constitute %59.8 of predicates in RCE. So, it could be concluded that the information status of linking verbs is influential in their frequent occurrence in RCE. Fig. 5 shows that in linking verbs RCE is significantly higher than RCC.

Fig. 6 shows the interplay of weight and verb types in RCE. The aim is to see if weight and verb types are interacting in RCE, or they are independent factors.

The overall picture is that as the RC length increases, compared with VP length, the extraposition rate also increases. The way these different factors interact and their relative rank are discussed in

Fig. 3. The interplay of information status and verb class in RCC and RCE (Bar-Chart).

Fig. 4. The interplay of information status and verb class in RCC and RCE (Linear).

next section.

3.4. Definiteness

In order to determine the role of definiteness, the head nouns were coded as definite or indefinite.

As is shown in Table 6, definites (58%) were more than indefinites (41%) in RCC, while this trend was reversed in the extraposed RCs; i.e. indefinites (53%) were more than definites (46%) in RCE.

3.5. Factors' strength of association

To evaluate the significance and the relation (prediction) power of variables and extraposition in a binary way, Chi-Square and Phi Cramer were respectively used. Within the model's discrete variables (verb type, information status, and definiteness), information status was not significant (see Table 7 for the removed variable from the equation), so the researchers didn't need to measure its power.

The results show that the relation of definiteness and

extraposition is significant (p = 0.006) and its power of prediction is .1. The direction of this relation could be evaluated by Crosstabs and the percentages distribution. Since in the extraposed sentences, the frequency of definite heads became reversed (in canonical sentences, definite heads are more than indefinite, but in extrap-osed sentences, indefinites are more than definites), definiteness and extraposition have inverse relationship. Fig. 7 graphically shows this inverse relation.

To evaluate the significance and the prediction power of verb types and extraposition, logistic regression results in Table 8 and the verb type percentages in two structures in Table 5 were consulted respectively. The unaccusatives, states, and linking verbs, were positively associated with extraposition and unergatives, events, and passive verbs were negatively associated.

The logistic regression results showed that the relation of verb types and extraposition was significant. To evaluate the significance and the strength of correlation of interval (continuous) variables (here, weight) and extraposition in a binary way, point biserial correlation coefficient was applied.

The results showed that the point-biserial correlation coefficient, rpb, was .446, and that the relation of weight and

unergative unaccusative state event linking passive

Predicate Type

Fig. 5. RCE and RCC in different verb types.

Relative weight range

Fig. 6. The interplay of weight and verb types in RCE.

extraposition was statistically significant (p = 0.000). The direction of this relation is discussed in section 3-1, in particular Fig. 1.

It is shown in Fig. 1 that in RCE, the RC was relatively longer than VP; while their lengths were not very different in RCC and it is the weight of the VP, not the sheer weight of the RC that determines the likelihood of extraposition.

3.6. Logistic regression analysis

Logistic regression analysis is applied to investigate the predictability of extraposition trend of Persian relative clauses. Logistic regression analysis is used whenever the dependent variable is nominal and the independent variables are the combination of nominal and interval. Doing this analysis, the researchers were to show what factors could predict extraposition and what the strongest factor was.

In this model, 3 of the predictor variables had significant relationship with the dependent variable (extraposition), i.e., verb type, definiteness, and weight. Also, regression results showed that information status variable entered into the equation by .468 level of significance; hence, the relation of this variable was not significant and it was eliminated from the model. Based on the Nagelkerke's R2, when the weight variable was the first input, it predicts the extraposition by %24.9. In the second step, entering the verb type variable, this prediction increased to %42.9. At third step, definite-ness entered into the model and the prediction increased to %43.9; hence, this model could predict Persian relative clause extraposition at least %24.9 and %43.9 at most.

In the classification table, based on the stepwise entrance of the

78 89 ■ ■

Canonical Extraposed

■ Definite ■ indefinite

Fig. 7. Bar-Chart for Crosstab of the relation of definiteness and extraposition.

variables, the accuracy of sentences classification was checked. In classification table, all variables are included, whether significant or not. It was seen that with the entrance of definiteness, the overall percentage of classification reached 85.4%. In this step, 527 canonical sentences (95.8%) were correctly grouped as canonical and 85 extraposed sentences as extraposed (50.9%). Hence, with 85.4% of certainty, we can predict the extraposition, with the four independent variables (information status, verb type, definiteness, and weight).

We used stepwise logistic regression because it deals simultaneously with all predictor variables and their interactions. In the first step, no variable (just the constant) was included. In the second step, the first independent variable was entered and then the other variables were included stepwise and their significance was examined. In the last step (step 3, Table 8), those variables which were not significant (information status) were excluded and only the significant variables were included.

Regression coefficients indicate the effect of the individual predictor variables on the outcome; but since the regression coefficients of logistic regression analysis are difficult to interpret, they are commonly transformed into odds ratios, which is a measure of effect size that indicates the likelihood of a particular outcome to occur (Agresti and Finlay, 2009).

The regression coefficients indicate the direction of change induced by a particular predictor: positive values (which correspond to odds ratios larger than 1.0) indicate that the predictor variable increases the likelihood of the relative clause to be extraposed or not; negative values (which correspond to odds ratios smaller than 1.0) indicate that the predictor variable decreases the likelihood of the relative clause to precede the verb (Agresti and Finlay, 2009). The Wald values and the associated levels of significance indicate that the predictor variables (Verb type except for event and passive verbs, definiteness, and weight) were significant and hence, the existence of those variables was effective in the model, concerning strength of association. In fact, Wald is like t in linear logistic regression. Hence, in logistic regression, first, we must see which variable has a significant relation with dependent variable, then with checking Exp (B) i.e. odds ratio, we see the rate of effect of these variables on the dependent variable. For instance, for cases with unaccusative matrix predicates, the odds of RCE are (on average) 10.3 times higher for unaccusatives than for unerga-tives, the base level.

The two final columns show the lower and upper boundaries of the confidence intervals for the odds ratios (Agresti and Finlay, 2009).

Table 8 which shows the third step of forward regression, and within the 4 factors, the non-significant factor i.e. information

Table 8

Variables in the equation.

Independent Variable B S.E. Wald Df Sig. Exp(B) odds ratio 95% C.I. for EXP(B) Lower Upper

Unergative 17.862 1 .000

Unaccusative 2.338 .675 11.979 1 .001 10.358 2.756 38.925

State 2.770 .669 17.168 1 .000 15.963 4.305 59.184

Event verb .773 .656 1.388 1 .239 2.166 .599 7.832

Linking verb 3.159 .614 26.479 1 .000 23.536 7.067 78.381

Passive 1.274 .765 2.776 1 .096 3.575 .799 16.001

Definiteness .548 .219 6.242 1 .012 1.730 1.125 2.660

Weight .277 .030 86.313 1 .000 1.319 1.244 1.399

Constant -4.645 .623 55.548 1 .000 .010

a. Variable(s) entered on step 1: Weight.

b. Variable(s) entered on step 2: Verb type.

c. Variable(s) entered on step 3: Definiteness.

status, reveals that, passive and event verbs don't have significant relationship with extraposition. Among the remaining variables, weight is the strongest predictor of extraposition, based on the highest Wald statistic, while the weakest predictor is the definite-ness variable, and the verb type is the intermediate predictive factor. In the verb types, extraposition occurs more in linking verbs, unergatives, states, and unaccusatives, respectively. Based on the findings it is concluded that extraposition occurs more often, whenever we have unergatives, unaccusatives, states, and linking verbs; the relative weight is bigger and the noun phrases are definite.

4. Discussion and typological implications

The results of this study show the interplay of different factors responsible for RCE in Persian. It was hypothesized that grammatical weight is an effective factor in RCE, and the data showed that when the extraposition distance is short, extraposition occurs more frequently, in accordance with domain minimization principle (Hawkins, 2004). It was demonstrated that Persian has a tendency to put longer elements in the end of the clause. The other factor in RCE was verb class. Among the different verb classes, RCE was more frequent with linking verbs. It is argued that linking verbs are predominantly given, contrary to other verb classes, so their information status is influential in their frequent occurrence in RCE. The findings of this paper are in line with Arnold et al. (2000), among others, who argue that constituent ordering is influenced by both newness and heaviness. They argue that the newer and heavier constituents are difficult to produce, so the producer postpones them as much as possible until later in the sentence. From hearer point of view, postponing heavy elements makes processing easier and reduces memory load in parsing (Hawkins, 1994). Many of the extraposed RCs in Persian have linking verbs in their matrix clause, meaning that they were new compared to the verb of the main predicate, which normally represent grounded information. So it could be concluded that extraposed RCs in Persian are predominantly seen when the extraposituin distance is short and RCs are new. While only %22 of RCs was extraposed in all the data, it rises to %43.6 in those having linking verb as the main predicate.

Although the data confirm that heavy RCs move more than short ones, there are some sentences in which the RC was not longer than VP, and even it was shorter, but extraposition occurred. As Fig. 6 shows, in almost all verb classes, there are some extraposed relative clauses which are not longer than VP. These unexpected items suggest that grammatical weight is not the only factor in extraposition and, as discussed, other semantic and pragmatic factors may extrapose the RC, even when they are shorter than VP.

The findings of this study have some typological implications for Persian, too. As one of the main goals, the study tried to take the prediction of Hawkins (2004) theory that head-initial and head-final languages behave differently in ordering the light-heavy constituents, and concentrating on Persian, it tried to test whether this language acts more like a head-initial or like a head-final language. While its surface structure is SOV, most of its other typological features correlate with VO languages (Dabir-Moghaddam, 2001, 2006). Newmeyer (2005:121) advocates Hawkins's approach and states that compared with verb-initial languages, it "makes precisely the opposite length and ordering predictions for head-final languages. And to be sure, there is a heavy-before light effect in those languages, both in language use and in the grammar itself'. Persian RCE could be used as an evidence to support the hypothesis that this language is more like VO languages, while seemingly SOV in surface. Among the different factors that affect Persian RCE, it could be argued that weight also plays a role, moving the heavy RCs rightward, to conform to short-before long tendency in VO languages (like English). If this language was truly a head-final language (OV), it was expected to have long before short preference (Yamashita and Chang, 2001), like Japanese. Therefore, RCE could be a support for considering Persian as a VO rather than OV language.

5. Conclusion

This study tried to test the validity of different hypotheses concerning the effective factors in Persian RCE. It focused on defi-niteness, grammatical weight and verb class, as three main motivations for RCE. The findings demonstrated that grammatical weight of RC in relation to VP plays the main role in RCE. The two other factors, verb class and definiteness, are ranked lower. In sentences with linking verbs, carrying given information, relative clause extraposition is more frequent. These findings support Hawkins (2004) domain minimization principle and it provides more evidence for the hypothesis that, Persian, while SOV in surface, acts as a VO language, by shifting the heavy constituents rightward and putting them after short ones.


Agresti, A., Finlay, B., 2009. Statistical Methods for the Social Sciences, fourth ed.

Pearson Prentice Hall, New Jersey. Arnold, J.E., Wasow, Thomas, Asudeh, Ash, Alrenga, Peter, 2004. Avoiding attachment ambiguities: the role of constituent ordering. J. Mem. Lang. 51, 55—70. Arnold, J.E., Wasow, Thomas, Losongco, Anthony, Ginstrom, Ryan, 2000. Heaviness vs. newness: the effects of structural complexity and discourse status on constituent ordering. Language 76 (1), 28—55. Baltin, Mark, 1981. Strict bounding. In: Baker, Carl L., McCarthy, John (Eds.), The Logical Problem of Language Acquisition. The MIT Press, Cambridge, MA,

pp. 257-295.

Baltin, M., 2006. Extraposition. In: Everaert, Martin, Riemsdijk, Henk van (Eds.), The Blackwell Companion to Syntax, vol. 2. Blackwell, Malden, MA, pp. 237-271.

Behaghel, O., 1909. Beziehungen zwischen Umfang und Reihenfolge von Satzgliedern. Indoger. Forsch. 25, 110-142.

Behaghel, O., 1930. Von deutscher Wortstellung. Zeitschriftfur Deutschkunde. Jar-gang 44 Z. Unterr. 81-89.

Chang, F., 2009. Learning to order words: a connectionist model of heavy NP shift and accessibility effects in Japanese and English. J. Mem. Lang. 61, 374-397.

Clark, H.H., Clark, E.V., 1977. Psychology and Language: an Introduction to Psy-cholinguistics. Harcourt Brace Jovanovich, New York.

Culicover Peter, W., Jackendoff, Ray S., 2005. Simpler Syntax. Oxford University Press, Oxford.

Comrie, B., 1989. Language Universals and Linguistic Typology, second ed. Basil Blackwell, Oxford.

Dabir-Moghaddam, M., 2001. Word order typology of Iranian languages. J. Humanit. 2.8,17-23.

Dabir-Moghaddam, M., 2006. Internal and external forces in typology: evidence from Iranian languages. J. Univers. Lang. 7, 29-47.

Dryer, M., 1992. The Greenbergian word order correlations. Language 68, 81-138.

Francis, E.J., 2010. Grammatical weight and relative clause extraposition in English. Cogn. Linguist. 21, 35-37.

Francis, E.J., Michaelis, L.A., 2014. Why move? How weight and discourse factors combine to predict relative clause extraposition in English. In: Macwhinney, B., Malchukov, A., Moravcsik, E. (Eds.), Competing Motivations in Grammar and Usage. Oxford University Press, Oxford, pp. 71-87.

Gregory, Michelle L., Michaelis, Laura A., 2001. Topicalization and left dislocation: a functional opposition revisited. J. Pragmat. 33,1665-1706.

Gonnerman, L.M., 2012. The role of efficiency and complexity in the processing of verb particle constructions. J. Speech Sci. 2 (1), 3-31.

Gonnerman, L.M., Hayes, C.R., 2005. The professor chewed the students ... out: effects of dependency, length, and adjacency on word order preferences in sentences with verb particle constructions. In: Proceedings of the Twenty-seventh Annual Conference of the Cognitive Science Society. Erlbaum, Mahwa, NJ, pp. 785-790.

Gundel, J.K., 1988. Universals of topic-comment structure. In: Hammond, Michael, Moravcsik, Edith A., Wirth, Jessica R. (Eds.), Studies in Syntactic Typology. Benjamins, Amsterdam, pp. 209-239.

Hawkins, J.A., 1994. A Performance Theory of Order and Constituency. Cambridge University Press, Cambridge, UK.

Hawkins, J.A., 1999. Processing Complexity and filler-gap dependencies across grammars. Language 75, 244-285.

Hawkins, J.A., 2004. Efficiency and Complexity in Grammars. Oxford University Press, Oxford.

Huck, G.J., Na, Y., 1990. Extraposition and focus. Language 66 (1), 51-77.

Huck, G.J., Na, Y., 1992. Information and contrast. Stud. Lang. 16 (2), 325-334.

Karimi, S., 2005. A Minimalist Approach to Scrambling, Evidence from Persian. Mouton de Gruyter, Berlin/New York.

Kayne, Richard S., 1994. The Antisymmetry of Syntax. MIT Press, Cambridge, MA.

Keenan, E., Comrie, B., 1977. Noun phrase accessibility and universal grammar. Linguist. Inq. 8 (1), 63-89.

Konieczny, L., 2000. Locality and parsing complexity. J. Psycholinguist. Res. 29 (6), 627-645.

Kuno, S., Takami, K., 2004. Functional Constraints in Grammar: on the Unergative-unaccusative Distinction. John Benjamins, Amsterdam.

Lohse, B., Hawkins, J.A., Wasow, T., 2004. Domain minimization in English verb-particle constructions. Language 80 (2), 238-261.

McCawley, J.D., 1987. Some further evidence for discontinuity. In: Huck, Geo.reyJ., Ojeda, Almerindo E. (Eds.), Discontinuous Constituency. Academic Press, New York, pp. 185-200 (Syntax and Semantics 20).

McCawley, J.D., 1998. The Syntactic Phenomena of English. University of Chicago Press, Chicago.

Michaelis, L.A., Francis, H.S., 2007. Lexical subjects and the conflation strategy. In: Hedberg, N., Zacharski, R. (Eds.), Topics in the Grammar-pragmatics Interface: Papers in Honor of Jeanette K. Gundel. Benjamins, Amsterdam, pp. 19-48.

Newmeyer, F.J., 2005. Possible and Probable Languages; a Generative Perspective on Linguistic Typology. Oxford University Press, New York.

Quirk, R., Greenbaum, S., Leech, G., Svartvik, J., 1972. A Grammar of Contemporary English. Longman, London.

Rochemont, M.S., Culicover, P.W., 1990. English focus Constructions and the Theory of Grammar. Cambridge University Press, Cambridge.

Ross, J.R., 1967. Constraints on Variables in Syntax. MIT, Cambridge, MA (dissertation).

Stallings, L., MacDonald, M.C., O'Seaghdha, P.G., 1998. Phrasal ordering constraints in sentence production: phrase length and verb disposition in heavy-NP shift. J. Mem. Lang. 39, 392-417.

Strunk, J., 2010. Enriching a tree bank to investigate relative clause extraposition in German. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10), May 19-21, 2010, Valletta, Malta.

Strunk, J., 2014. A statistical model of competing motivations affecting relative clause extraposition in German. In: Macwhinney, B., Malchukov, A., Moravcsik, E. (Eds.), Competing Motivations in Grammar and Usage. Oxford University Press, Oxford, pp. 88-106.

Taghavipour, M.A., 2005. Persian Relative Clauses in Head-driven Phrase Structure Grammar. Ph.D. Dissertation. University of Essex.

Takami, K., 1999. A functional constraint on extraposition from NP. In: Kamio, Akio, Takami, Ken-ichi (Eds.), Function and Structure. John Benjamins, Amsterdam, pp. 23-56.

Uszkoreit, H., Brants, T., Duchier, D., Krenn, B., Konieczny, L., Oepenand, S., Skut, W., 1998. Studienzurperformanzorientierten Linguistik: Aspekte der Rela-tivsatzextrapositionim Deutschen. Universita t des Saarlandes, Saarbrücken, pp. 1-14. CLAUS Report. No. 99.

Wasow, T., 1997a. Remarks on grammatical weight. Lang. Var. Change 9, 81-105.

Wasow, T., 1997b. End-weight from the speaker's perspective. J. Psycholinguist. Res. 26 (3), 347-361.

Wasow, T., 2002. Postverbal Behavior. CSLI Publications, Stanford.

Yamashita, H., Chang, F., 2001. Long before short preference in the production of a head-final language. Cognition 81, B45-B55.

Yamashita, H., Chang, F., 2006. Sentence production in Japanese. In: Nakayama, M., Mazuka, R., Shirai, Y. (Eds.), Handbook of East Asian Psycholinguistics. Cambridge University Press, Cambridge, UK., pp. 291 -297 (volume 2, Japanese).