Scholarly article on topic 'Toward a generative theory of language transfer: Experiment and modeling of sC prothesis in L2 Spanish'

Toward a generative theory of language transfer: Experiment and modeling of sC prothesis in L2 Spanish Academic research paper on "Languages and literature"

Share paper
Academic journal
Open Linguistics
OECD Field of science

Academic research paper on topic "Toward a generative theory of language transfer: Experiment and modeling of sC prothesis in L2 Spanish"

Research Article Open Access

Robert Daland, Ingrid Norrmann-Vigil

Toward a generative theory of language transfer: Experiment and modeling of sC prothesis in L2 Spanish

DOI 10.1515/opli-2015-0024

Received November 10, 2014; accepted September 16, 2015

Abstract: When native Spanish speakers produce English words with initial [s]-consonant clusters (sC), they sometimes produce a prothetic vowel, e.g. stigma > estigma. This paper reports a production experiment on this phenomena, as well as computational modelling of the experimental results. Carlisle (1991a) proposed the 'resyllabification account' in which prothesis is a language transfer effect, whose essential motivation is to satisfy Ll/Spanish syllable phonotactics. Replicating all previous work, a greater rate of prothesis was found in postconsonantal contexts than in postvocalic contexts (Rick (e)stinks > Ricky (e)stinks). A novel prediction is that when prothesis occurs, the [s] should have durational characteristics associated with the coda position, whereas it should have onset characteristics when prothesis does not occur; this was found. Another prediction is that a grammar which captures the variability in prothesis should in some sense be "between" the Ll/Spanish and L2/English grammars. This latter prediction was tested by developing a constraint-based analysis of sC prothesis in Maximum Entropy Harmonic Grammar (Goldwater & Johnson, 2003). The results were consistent with a view of language transfer as 'linear interpolation' of constraint weights, conditioned on an 'effort' constraint reflecting how phonological planning varies with task/ modality demands.

Keywords: phonotactics, prothesis, Spanish, maxent

1 Introduction

It is well-known that speakers who acquire additional languages as adults exhibit 'language transfer' - they speak a variant of the target language reflecting interference from earlier languages. Language transfer effects pose two related, but conceptually distinct problems for language science. The first is empirical - what do speakers actually do, and what factors influence their behavior? The other type of problem is theoretical - what aspects of non-native speech should be accounted for by grammatical theories, and how does the theory account for them? This paper addresses these questions in the context of native speakers of Mexican Spanish who began speaking English as adults, hereafter referred to as 'SpEns'.

When SpEns say English words beginning with an [s]-consonant cluster (hereafter referred to as sC), they often produce a prothetic vowel before the onset cluster, e.g. school ^ (e)school (Carlisle, 1991a; Goldstein, 2001; Harris, 1987; Yavas & Barlow, 2006). This is evidently a language transfer effect, since Spanish words may not begin with sC, English loanwords beginning with sC are adapted into Spanish with an initial [e] (Hualde, 2005; Nunez Cedeno & Morales-Front, 1999), and the Spanish lexicon contains many cognates

Corresponding author: Robert Daland: University of California Los Angeles, Los Angeles, United States, E-mail:

Ingrid Norrmann-Vigil: University of California Los Angeles, Los Angeles, United States

HE2HH © 2015 Robert Daland, Ingrid Norrmann-Vigil, published by De Gruyter Open.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.

in which English sCcorresponds to Spanish esC (special ~ especial, state ~ estado, sculptor ~ escultor, etc..). Thus, a variety of facts suggest that for SpEns, prothesis is a preferred repair for sC. To what extent does linguistic theory offer any insight into why this might be?

Robert Carlisle, who has produced the majority of empirical studies on sC prothesis in SpEns, proposed the resyllabification account (Carlisle, 1991a). The essence of the idea is that Spanish allows [s] in the syllable coda, but not sC in onset. Spanish phonotactics can be satisfied by 'moving' the [s] out of complex onset and into a preceding coda position. When the preceding word ends with a vowel, as in (1b), that word's lexical vowel may absorb the [s]. However, when the sC word is phrase-initial, as in (1a), a new coda position must be created through vowel prothesis. In both cases, the [s] is said to 'resyllabify' because it syllabifies with a vowel that does not originate from the same morpheme as the [s] itself.

(1) / spa ni[/ /tu#skul /


a. [es pa nij] b. [tus kul ]

I ' \ Y \I/ \I/


An appealing aspect of this account is that an analogous resyllabification occurs with rapid speech in Spanish (Harris, 1983), as shown in (2):

(2) /o tros #es pe ran/


[o tro ses pe ran]

I \\I \I/ \I \I/


The process in (2) is an important motivator for the resyllabification account of prothesis (1a), because it illustrates that Spanish is willing to sacrifice word/syllable alignment for syllable well-formedness under at least some circumstances; and this is the common point between (1a), (1b), and (2).

Despite this commonality, (1a), (1b), and (2) are rather phonologically distinct. The native Spanish process (2) could be termed coda-to-onset movement, since the result is that a word-final obstruent is syllabified as the onset to the initial vowel of the following word. Therefore, (2) is amenable to an analysis in which resyllabification occurs to eliminate a syllable coda consonant, or supply a syllable onset consonant, or both. In contrast, the non-native processes represented in (1a,b) involve what might be called onset-to-coda movement. These items cannot be analyzed purely in terms of avoiding coda consonants or supplying onsets, since they do neither. In fact, resyllabification creates a syllable coda consonant in both (1a) and (1b), and (1a) creates an onsetless syllable (as well as the prothetic vowel). Evidently, (1a,b) call for a more complex analysis than does (2); minimally, the analysis should capture that resyllabification in (1a,b) is driven by avoidance of sC. A later section of this paper will develop a constraint-based analysis that generates the native/Spanish pattern, the target/English pattern, and the SpEn/prothetic pattern. Prior to that, however, it will be helpful to review previous work on sC prothesis, to which the paper now turns.

2 The resyllabification account of sC prothesis

The resyllabification account of sC prothesis is rooted in the phonotactics of Spanish, which may be summarized as follows. In licit word-initial clusters, the first consonant must be an obstruent ([p], [t], [k], [b], [d], [g] and [f]) and the second element must be a liquid (Macpherson, 1975; Harris, 1983; Hualde, 2005). Clusters containing an [s] followed by a consonant may occur postvocalically but not word-initially, e.g.

estigma/*stigma} Spanish words commonly end with the singletons [s], [d], [n], [l], and [r] (Goldstein, 2001; Hualde, 2005; [x] may occur in words borrowed from Arabic such as reloj). Spanish words never end with consonant clusters, although triconsonantal clusters do occur word-medially, e.g. obstáculo, perspicaz, monstruo. Thus, following the principle that only licit word-initial clusters can be syllable onsets, esC clusters are standardly analyzed with the [s] in coda, e.g. e[s.l]avo 'Slav', hemi[s.f]erio 'hemisphere' (Macpherson, 1975; Núñez Cedeño & Morales-Front, 1999). As already shown in (2), /otros # esperan/ ^ [], resyllabification occurs as a natural conversational process in Spanish: word-final consonants may be syllabified as onsets to following vowel-initial words in rapid speech (Harris, 1983).

By analogy with this native resyllabification process, Carlise (1991a) proposed that sC prothesis could be explained as a phonotactic repair in which the English target is given a licit syllabification in Spanish. Since Spanish does not allow sC word-initially, but does allow [s] in coda, sC clusters can be repaired by 'moving' the [s] into coda position. Of course, codas must be licensed by vowel nuclei, and since there is (by definition) no lexical vowel nucleus before the word-initial consonant, a vowel must be inserted. Thus in the target speed, resyllabifying [s] to coda also requires prothesis (/spid/ ^ []). The next two subsections develop two predictions of the resyllabification account that have been addressed in previous empirical studies.

2.1 Prediction for prothesis rate: C#sC > V#sC (controlling prosodic context)

One prediction of the resyllabfication account is that the preceding context affects the rate of prothesis: prothesis is favored in a postconsonantal context (abbreviated C#sC) as compared with a postvocalic context (abbreviated V#sC). For example, prothesis should be more likely in Rick # (e)stinks than Ricky # (e)stinks. Although the prediction itself is stated in a way that makes it straightforward to test, it involves some subtleties that were not apparent to us when we began this work. Therefore, we walk through this prediction in some detail.

Previous work has considered three kinds of preceding context: phrase-initial (abbreviated #sC), postconsonantal, and postvocalic. For concreteness, we will use the nonce words pa and pak to stand in for vowel-final and consonant-final words, and sta to stand in for words beginning with sC onsets. Our approach in this section is inspired by the constraint-based approach to phonology (Prince & Smolensky, 1993/2002/2004), in which the speaker considers a variety of possible repairs and selects the least odious one (according to possibly language-specific priorities). However, in this section we will consider repairs involving only resyllabification and/or prothesis, as these are the ones which are actually attested in SpEns. Later, in the phonological analysis section, we will develop a fuller constraint-based account which incorporates other logically imaginable repairs, such as anaptyxis (medial epenthesis) and consonant deletion.

We begin with the postconsonantal context of pak#sta. In this case, a SpEn has four logical options, schematized in (3):

(3) postconsonantal (C#sC) repairs: /pak#sta/ ^

a. [pak#sta] the 'faithful' candidate -- no resyllabification, no prothesis

b. [pak#s.ta] the 'resyllabification' candidate -- resyllabification, but no prothesis

c. [pak#es.ta] the 'prothetic' candidate -- prothesis and resyllabification

d. [pa.k#es.ta] the 'double resyllabification' candidate -- like 3c, but with coda-to-onset resyll'n

1 We follow the Spanish phonological tradition in assuming that Spanish has diphthongs, not consonantal glides. For example, if its second segment were analyzed as a glide consonant, then sueño 'dream' would violate the otherwise exceptionless generalization that sC clusters are prohibited. Similarly, if its third segment were analyzed as a glide consonant, prueba 'test/ exam' would violate the otherwise exceptionless ban on triconsonantal onset clusters. Therefore, we use the word 'consonant' to refer to [+cons] segments (liquids/nasals/plosives), but not to glides.

We will use the convention that in pronounced forms, a superscript hash mark # indicates a word boundary that is internal to a syllable (mis-aligned with syllable boundaries), while a period (.) indicates a word-internal syllable boundary; regular hash # indicates a word boundary that is also a syllable boundary. Thus, in the 'faithful' candidate (3a), the word boundary is perfectly aligned with the syllable boundary. This form is the correct/target pronunciation for English speakers, but it is illicit according to Spanish phonotactics because it contains an sC onset. Candidate (3c) is the 'prothetic' repair, in which a vowel is epen-thesized before sta. The # symbol is placed before the prothetic vowel in (3c) to indicate that the prothetic vowel is not syllabified with any material from the preceding word. However, given that prothesis occurs, it is a logical possibility that the word-final consonant of pak would undergo coda-to-onset resyllabification as well; this possibility is represented in candidate (3d). Note that the string [pakesta] is or might be perceptually ambiguous between options (3c) and (3d). In fact, there is a comparable ambiguity with candidate (3a) -- it is logically possible that /pak#sta/ could undergo onset-to-coda resyllabification, yielding candidate (3b). Note that (3b) results in a pronounced form with a complex, Cs coda, which does not occur word-finally in Spanish. That is, (3b) satisfies one exceptionless phonotactic of Spanish at the cost of violating another, equally exceptionless phonotactic. With these facts in place, (3c) and (3d) are the only candidates which satisfy all the exceptionless principles of Spanish phonotactics, and both of these involve prothesis.

The situation is different with the postvocalic context pa#sta, where a SpEn has three options:

(4) postvocalic (V#sC) repairs: /pa#sta/ ^

a. [pa#sta] the 'faithful' candidate -- no resyllabification, no prothesis

b. [pa#s.ta] the 'resyllabification' candidate -- resyllabification, but no prothesis

c. [pa#es.ta] the 'prothetic' candidate -- prothesis and resyllabification

As before, the 'faithful' candidate (4a) is the English/target form, which violates Spanish phonotactics because of the sC onset. In the resyllabification candidate (4b), the [s] has undergone onset-to-coda resyllabification, necessarily causing the word boundary to be mis-aligned with the syllable boundary. However, (4b) avoids an sC onset without actually requiring prothesis. Given the logical possibility of satisfying Spanish phonotactics without prothesis, it is tempting to wonder whether a prothetic candidate is even needed. We include one here, for several reasons. First, it is at least a logically possible repair, so in principle the grammar should account for its presence/absence. Second, although upon casual inspection the resyllabification candidate (4b) appears to be superior to the prothetic candidate (4c) in every dimension, (4c) does have one desirable property is lacking in (4b): there is no prosodic structure that is shared between multiple words. That is, even though prothesis causes resyllabification in (4c), the resyllabification does not cross a word boundary like in (4b). The third and most important reason to include the prothetic candidate is that it is attested: all existing studies of SpEns which included the postvocalic context have found prothesis actually does occur in this environment. In the phonological analysis later in this paper, we will sketch an account of how and why (4c) might occur in terms of prosodic planning. For now, the relevant points are as follows: prothesis (4c) repairs the sC onset without violating any exceptionless Spanish phonotactics, and so does resyllabification-without-epenthesis (4b).

Summarizing the above, prothesis is the only way to satisfy exceptionless Spanish phonotactics in the postconsonantal context (e.g. pak#sta). However, in the postvocalic context (e.g. pa#sta), there are two ways: resyllabification-with-prothesis (4c) and resyllabification-without-epenthesis (4b). The faithful candidate equally violates the sC phonotactic in both C#sC and V#sC contexts (3a, 4a, respectively), so the rate at which this candidate is avoided ought to be comparable across both contexts, all other things being equal. But when the faithful candidate is avoided in C#sC, the only viable alternative is prothesis, whereas when the faithful candidate is avoided in V#sC, resyllabification-without-epenthesis is a viable alternative to resyllabification-with-prothesis, and so prothesis should be selected less in V#sC than C#sC.

It is natural to wonder about the phrase-initial context #sC. This context is superficially similar to the postconsonantal context C#sC, since prothesis is the only way to avoid an sC onset. However, it differs from the C#sC and V#sC contexts in several ways. First, there are only two candidates, shown in (5):

(5) phrase-initial (#sC) repairs: /#sta/ ^

a. [#sta] the 'faithful' candidate -- no resyllabification, no prothesis

b. [#es.ta] the 'prothetic' candidate -- prothesis and resyllabification

Unlike V#sC and C#sC, there is no option for resyllabification-without-prothesis. Another difference is that in (5), there is no candidate in which multiple words share prosodic structure (because there is only one word). A final difference between #sC and V#sC/C#sC is in the extent of prosodic planning involved, which we mention now because the experimental data we present later clearly indicate the rate of prothesis is sensitive to task demands. An sC word pronounced in isolation is phrase-initial, and the planning demands might be especially low in this situation (speakers may also attend especially to articulation when producing words in isolation). C#sC and V#sC both by definition involve at least two words, which necessarily involve more planning than a single word. However, if an sC word is at the beginning of a longer phrase, the speaker may begin producing it when their attention is engaged with planning the end of that phrase; conversely if C#sC/V#sC is phrase-final, most of the planning has presumably already been done. Put simply, all other things are not equal between #sC and C#sC/V#sC.

Indeed, the prediction about the relative rates of prothesis in C#sC vs. V#sC must be relativized to particular syntactic/prosodic contexts. Crosslinguistically, the likelihood of resyllabification across a prosodic boundary decreases with the 'strength' of the boundary. For example, resyllabification across an utterance boundary should be impossible, while resyllabification in an adjective-noun phrase like high # speed should be comparatively accessible for SpEns. When collecting data on the rate of prothesis, therefore, it is necessary to equate or at least control the prosodic context as much as possible.

2.2 Prediction for prothesis rate: sT > sN > sL (T = stop, N = nasal, L = liquid/glide)

According to some theories, the likelihood of prothetic repair for sC onset clusters should depend on the sonority of the C, with prothesis being more likely when C has low sonority (i.e. stops). This prediction does not follow from the resyllabification account alone, but requires some additional assumptions.

There are at least two theoretical perspectives from which a sonority effect is predicted. The Sonority Sequencing Principle (SSP: Sievers, 1881) states that syllables should rise in sonority from onset through the nucleus, and fall from the nucleus through the coda. It should be noted that the existence of a unified sonority scale, and by implication the SSP, is contested (e.g. Shaw & Davidson, 2012). However, among researchers who accept the SSP as a grammatical principle, it is uncontroversial that liquids are more sonorous than nasals, nasals are more sonorous than obstruents, and that stops are equal or lower in sonority than fricatives (Selkirk, 1984). Therefore, the SSP indicates sT onsets are more ill-formed than sN onsets, and sN onsets are more ill-formed than sL onsets (where T, N, and L indicate stops, nasals, and liquids respectively). Moreover, there is abundant evidence that speakers are sensitive to this kind of ill-formed-ness distinction even without positive evidence of it (e.g. Berent et al., 2007; Daland et al., 2011). Formally speaking, then, one way to derive the prediction that prothesis rate follows the pattern in sT > sN > sL is by casting the SSP as a markedness hierarchy (Gouskova, 2004). Since sT is the most marked sC onset, the desire to repair it should be stronger; and since prothesis is the available repair, prothesis should be higher for sT than other sC onsets.

An alternative account is based on recent work highlighting the role of perception in both native and non-native phonology (Kirchner, Hayes, & Steriade, 2004; Fleischhacker, 2001, 2005; Broselow, 2015). Fleischhacker (2001, 2005) analyzes vowel epenthesis in loanword adaptation using a faithfulness hierarchy. She conducts a cross-linguistic survey, finding that illicit onset clusters are preferentially repaired by anaptyxis (i.e. a vowel is epenthesized between the two consonants of the cluster). The only systematic exception to this is sC (and other sibilant-consonant clusters), which is repaired by anaptyxis in some cases and by prothesis in others. Fleischhacker argues that the reason prothesis is sometimes preferred for sC clusters is because anaptyxis introduces an especially large perceptual distortion in the s_C environment,

both in comparison to other C_C contexts and in comparison to #_s. Moreover, she argues that anaptyxis in s_T is contexts is perceptually worse than in s_N contexts, which is in turn worse than in s_L contexts (where T, N, and L refer to stops, nasals, and liquids, as above). Therefore, prothesis should be most strongly favored for sT onsets, and less strongly favored for sL onsets. Note that Fleischhacker's account is primarily based upon crosslinguistic patterns in loanword phonology rather than L2 phonology per se. However the underlying perceptual basis is a mismatch between the L2 and perceptual expectations arising from L1 in both cases, so the prediction should equally hold for L2 phonology. Crucially, Fleischhacker's account makes the same prediction as the sonority account even though the underlying theories are quite different: prothesis rate should follow the pattern sT > sN > sL.

2.3 Previous empirical studies: definitely C#sC > V#sC; maybe not sT > sN > sL

In previous empirical studies on sC prothesis in SpEns, far and away the most robust finding has been that sC prothesis occurs more frequently postconsonantally than postvocalically (C#sC >> V#sC). This result has been obtained by every study whose design allowed for a test of it (Abrahamsson, 1999;2 Carlisle, 1991a et seq.; Rauber & Baptista, 2004). Additionally, it has consistently been found that the rate of sC prothesis (phonotactic repair) is consistently higher with sCfi2 than comparable sC1 or sC2 onsets (e.g. split is more likely to undergo prothesis than spit or slit; Abrahamsson, 1999; Carlise, 1991a; 2002; Rauber & Baptista, 2004). The triconsonantal cluster data has theoretical implications (in particular, it invites analysis as a ganging phenomenon), but since the present study does not focus on triconsonantal clusters, we will not discuss this point further.

When it comes to comparisons between phrase-initial and postconsonantal/postvocalic contexts, the results are more variable. Abrahamsson (1999) found that prothesis was least frequent postvocalically (34%), of intermediate frequency phrase-initially (62%), and most frequent postconsonantally (94%), which may be summarized as V#sC < #sC < C#sC (all differences significant). In contrast, Rauber & Baptista (2004) found #sC < V#sC < C#sC (prothesis rates: 17%, 23%, 40%, respectively; all differences significant). Since Rauber & Baptista did not offer an explanation for this discrepancy, we speculate that it derives from modality. Abrahamsson's speaker was conversing spontaneously, and had to allocate considerable cognitive resources to planning high-level aspects of his utterances (such as what message to express, which words to express that message, and the order they would occur in). In contrast, Rauber & Baptista's speakers were reading aloud, so the resources they saved not by not doing high-level planning could be allocated to orthographic and phonological processing. Take together, these findings suggest that the relative frequency of sC prothesis phrase-initially appears to be subject to task variables, while it has consistently been found that V#sC < C#sC (for discussion see Abrahamsson, 1999, pp. 498-504).

When it comes to the prediction of a sonority effect, Carlisle has asserted that the rate of prothesis is highest for obstruent clusters and lowest for liquids, i.e. [sp], [st], [sk] >> [sm], [sn] >> [sl] (Carlisle, 1991b et seq.). However, our view is that the literature offers at best weak support for this claim. Table 1, below, shows the prothesis rates (as a percentage) reported for a variety of studies for clusters [sp], [st], [sk], [sm], [sn], and [sl].

Table 1. Prothesis rates for sC clusters in previous studies (percentages). Merged cells indicate that the authors did not distinguish the clusters when reporting prothesis rate.

Study [sp] [st] [sk] [sm] [sn] [sl] Notes

Abrahamsson (1999) 59 54 75 Swedish

Carlisle (1988) -- -- -- 38 33 29

Carlisle (1991a, Exp't 1) 63 64 61 - - --

Carlisle (1991a, Exp't 2) 72 74 72 - - --

Carlisle (1991b) -- 36 -- -- -- 25

Rauber & Baptista (2004) 26 37 27 20

2 Abrahamsson (1999) is a longitudinal study of a native Spanish speaker learning Swedish, rather than English. We group it with English studies since the inventory of initial clusters to be acquired is rather similar.

Evidently, there is a large degree of between-experiment variability in prothesis rate for the same cluster, which presumably reflects between-group differences in phonotactic proficiency, and/or task manipulations. Thus, any conclusions about sonority must be based on within-group comparisons. The relevant within-group comparisons are given in Table 2.

Table 2. Predicted vs. observed sonority effects in sC prothesis.

Prediction about prothesis rate Predicted sonority effect Anti-sonority effect

sT > sN Abrahamsson (1999)

sT > sL Carlisle (1991b) Abrahamsson (1999)

sN > sL Abrahamsson (1999)

Table 2 shows that there have been relatively few studies testing the same group on clusters with differing sonority profiles which did not collapse across the relevant clusters. Note that Rauber & Baptista (2004) included a coarse sonority manipulation, and the numerical values were consistent with a sonority effect (see Table 1), but they did not report the relevant statistical comparison, so we have not included that study in Table 2. The exceptionally high prothesis rate on [sl] clusters in Abrahamsson (1999) resulted in two comparisons that actually went in the opposite direction of what is expected by sonority. The most important generalization, therefore, is that remarkably little existing data actually bears on the sonority effect, and what data there is offers conflicting results.

2.4 Two additional predictions of the resyllabification account

Previous sections have discussed the resyllabification account in somewhat theory-neutral terms, as well as a careful development of predictions that have been tested in previous empirical work. Section 2 closes with two additional predictions of the resyllabification account, one necessarily involving a greater level of theoretical commitment.

A simple prediction of the resyllabification account is that when prothesis occurs, the [s] of an sC onset will occupy the coda position of the prothetic vowel, and will therefore have phonetic characteristics associated with an [s] in coda position. In contrast, when prothesis does not occur in an sC word, the [s] will occupy the onset position, and should therefore have phonetic characteristics appropriate to this position. Of course, this prediction is contingent on two conditions: first, it must be that these positions are phonetically distinguished in native English speakers, and second, it must be that non-native speakers acquire position-specific phonetic characteristics. If both of these conditions are met, the phonetic prediction offers a meaningful way to falsify the resyllabification account (although there may be other accounts which make the same prediction, an issue to which we return briefly in the General Discussion). Fortunately, the first condition is known: fricatives are systematically longer in onset than in coda, when other factors such as phrase-final lengthening are controlled for (Redford, 2004). Thus the resyllabification account predicts that [s] should be shorter when there is a prothetic [e] than when there is not one.

An additional prediction of the resyllabification account is that the SpEn pattern -- variability between prothesis and faithful productions of word-initial sC -- is in some sense 'intermediate' between the L1/ Spanish grammar and the L2/English target. This prediction can only be tested in the context of a specific phonological analysis where it is meaningful to speak of a grammar being 'intermediate' between two other grammars. Moreover, in order to faithfully characterize the SpEn data, the analysis must provide for opti-onality, i.e. stochastic variation between multiple outputs for the same input (namely prothesis vs. faithfulness for sC). These requirements are met by the phonological formalism known as Maximum Entropy Harmonic Grammar (MaxEntHG: Goldwater & Johnson, 2003; Hayes & Wilson, 2008; Daland et al., 2011). MaxEntHG is a probabilistic variant of Optimality Theory, in which constraints are weighted (rather than ranked). In this context, an interphonology grammar is "between" two other grammars if it's weight vector is "between" theirs (in a mathematical sense that is formalized later). The resyllabification account may be

deemed adequate if the intermediate weights yield a good match to the quantitative pattern of variation in SpEns (including the contextual effects).

The remainder of the paper is structured as follows. Section 3 reports a production study of Mexican Spanish speakers producing sC words both in isolation and phrase-finally in V#sC and C#sC contexts; the six clusters [sp], [st], [sk], [sm], [sn], and [sl] were presented with roughly equal frequency. A statistical analysis finds the expected environmental effect (more prothesis in C#sC than V#sC) but a null effect of sonority (equal prothesis in sT, sN, sL); the analysis also revealed systematic task/modality effects suggesting that the rate of prothesis is correlated with the 'effort' involved in phonological planning. Section 4 opens by developing an Optimality Theory (OT) analysis of sC prothesis; it is shown how the Spanish and English patterns are obtained by different prioritizations of the same constraints. Next, the OT analysis is adapted for MaxEntHG. In the first simulation, the same constraints are fit to the categorical English mapping, the categorical Spanish mapping, and the variable SpEn pattern (i.e. the production data collection in the Experiment). It was found that the SpEn constraint weights are mostly intermediate between the Spanish and English weights, but SpEns have a much higher weight than either Spanish or English for a constraint that reflects cognitive 'effort'. In the second simulation, it is shown that the task and modality effects on prothesis can be precisely modeled by setting the 'effort' constraint weight high, and otherwise linearly interpolating constraint weights between the Spanish and English weights. The simulations therefore offer support for viewing language transfer as interpolation, when provision is made for the extra effort involved in producing speech in a non-native language. Section 6 discusses and concludes.

3 Production experiment

The primary goal of the Experiment was to assess the phonological and phonetic predictions of the resyllabification account. An additional goal was to carefully control the dialect background of the participants.

3.1 Participants

Four adult speakers of Mexican Spanish were recruited for this study. The study focused specifically on Mexican Spanish, for two methodological reasons. First, in this dialect, /s/ is consistently realized as [s] in coda, regardless of the following environment (whereas in many other Spanish dialects [s] may debuccalize or delete altogether; Canfield, 1981; Hualde, 2005). Thus, it is methodologically desirable to control the dialect background (cf. Carlisle, 1991a, Experiment 2). The other reason for selecting Mexican Spanish was the ability to recruit participants. Although the number of speakers is modest, it is not out of line with previous studies on this topic in which dialect background was carefully controlled (Abrahamsson, 1999, n=1; Carlisle, 1991a, Experiment 1, n=4; Rauber & Baptista, 2004, n=9). Talker biographic details are given in the leftmost column of Table 3, below.

Table 3. Biographic details of talkers, and stimulus lists.

Participant Age Years in US Read Sentences Read Words Spoken Sentences Spoken Words Spanish

M1 48 27 A A A A A

M2 35 9 A A B B A

M3 52 30 B B A A B

M4 27 10 B B B B B

The final columns in Table 3 indicate the list (order) in which items were produced (see Procedure). Note that the age and length of residence in the US are quite heterogeneous; however all participants have resided in the US for close to a decade or more, while length of residence does not appear to affect degree of accent after the first year (Flege, 1988). Participants' English proficiency was informally assessed through a brief

oral interaction with the second author. Participants were recruited if they showed an intermediate level of English, meaning that they could communicate, read, and write in English but nonetheless showed a relatively high degree of L1 transfer in pronunciation, intonation, lexical choice, and/or grammar. Crucially, participants neither presented full transfer of Spanish phonotactics in English sC clusters, nor always produced sC clusters without [e] prothesis.

3.2 Stimuli

The stimuli consisted of 100 English sentences (36 critical, 36 control, 28 fillers) and 69 English words (30 critical, 39 fillers); all critical and control stimuli contained one of the following clusters: [sp], [st], [sk], [sm], [sn], [sl]. Critical sentences contained an sC-initial word with initial stress in the phrase-final position (or possibly followed by the stressless pronoun it), so that the nuclear accent fell on the vowel immediately following the sC cluster. Control sentences also contained an [s]-consonant cluster immediately prior to the nuclear accent, but differed in that a word boundary intervened between the [s] and the consonant. In half of the critical and control sentences, the cluster was postconsonantal, and in the other half it was postvocalic. The full stimulus list is given in the Appendix; examples are given in (6):

(6) Critical Control

postcons.: He bought stamps. The giant gun shoots tanks.

postvocalic: This weekend we'll go to Stanford. The new helper has to bus tables.

The reason for including control sentences was to guarantee that [s] would occur in coda position, so the phonetic characteristics of [s] could be measured when we knew it was a coda.

A separate sC-intial word was used in each critical item, and the sC targets were selected to avoid cognates (i.e. a pair like sculptor~escultor with similar forms and meanings, differing primarily in the presence/ absence of an initial [e]). For each cluster, there were 3 critical sentence items with a postconsonantal environment, 3 more critical sentence items with a postvocalic environment, and 5 critical word items. Thus, the total number of critical sentences was 6 clusters * 2 environments * 3 phrases = 36, and the total number of critical words was 6 clusters * 5 words = 30.

There were also 3 postconsonantal and 3 postvocalic control sentences per cluster, so the number of control sentences was equal to the number of critical sentences. The filler sentences included 10 generic sentences, and 18 sentences in which sC-initial words occurred after an [s] (e.g. For lunch he enjoys Spam.) The s#sC items were originally intended as critical test items, but the phonetic analysis suggested these items exhibited a much wider array of repairs than other postconsonantal items, so those data were not analyzed here.

These materials comprised the written stimuli. Spoken variants of the stimuli were produced by a female English native speaker with high Spanish proficiency.

3.3 Procedure

Two stimulus lists were created by randomizing the order of sentences within the sentence block and the order of words within the word block. List presentation was counterbalanced across speakers and blocks (see Table 3). Speakers were digitally recorded using Audacity (Mazzoni et al., 2007), a MacBook computer (OSX 10.5.8), and a Logitech USB desk microphone, model 980186-0403. Participants were first presented with the English sentences and read them aloud. Next, they read aloud the English words in isolation. Then, participants heard spoken versions of the English sentences produced by the female English speaker, and repeated the sentence out loud. Finally, participants repeated after spoken versions of the English words. Thus the order was READ SENTENCES, READ WORDS, SPOKEN SENTENCES, SPOKEN WORDS. They were asked to repeat each sentence/word twice. (Speakers also produced some Spanish sentences after the English stimuli, but those data are not analyzed here.) The production experiment took about 60 minutes per speaker.

The following measurements were made: presence, duration and three formants of the [e] prothesis; duration of [s] from the onset sC; duration of the gaps (if any) that occurred before and/or after the [e] prothesis; duration of [s] from the consonant cluster; any aspiration of stops. The present study will focus only on the presence/absence of [e] prothesis and the duration of [s]; therefore those measurements are described and illustrated further.

3.3.1 Presence, absence, and duration of prothetic [e].

In cases where sC occurred immediately after a consonant or a silence, prothesis was assessed from the waveform and spectrogram. Contrasting examples are shown in Figure 1 and Figure 2. In Figure 1, the stop closure is followed immediately by the sC cluster, so no prothesis was diagnosed. In Figure 2, voicing and formant structure clearly intervene between the stop closure and sC. In cases like Figure 2, duration of [e] was measured from the beginning of the first regular peak where formants (F1 and F2) were visually apparent to the end of the wave pattern, as illustrated in Figure 2.

467.4 0.095


Figure 1. Measurement of English stimuli without vowel prothesis (He b[ot st]amps).





Figure 2. Measurement of English stimuli with vowel prothesis (He b[ot est]amps).

In the cases where sC followed a final vowel with no intervening silence, perceptual evidence was used to

determine whether [e] was prothesized. If the native Spanish-speaking author perceived an prothetic vowel, then

slight changes in the wave pattern or formants were used to determine the boundary; in general the boundary

fell at around 3A of the total length of both vowels combined. A representative example is shown in Figure 3.

31.3697466 31.6848793




Figure 3. Measurement of English stimuli with vowel prothesis after a vowel (Sar[a esk]ewers).

3.3.2 Duration of [s]

To determine the duration of [s], the beginning and end of the boundaries were set between the points where there frication noise appeared on the majority of the scale of the spectrogram (as in Figs. 1-3).

3.4 Results

To give a global view of the data, as well as some idea of the inter-speaker variability, the data are visually summarized in Figure 4, which reports the rate as a function of cluster (Figure 4a) and environment by modality and repetition (Figure 4b). The rate of prothesis by participant and environment is shown in Table 4.

Table 4: Mean rate of prothesis (percentage) by participant and environment.

speaker postvocalic isolation postconsonantal

Ml 42.0 15.8 63.0

M2 27.9 6.8 26.1

M3 14.9 27.7 34.8

M4 58.6 55.8 88.2

Beyond Table 4, the data were subjected to two analyses. The first analysis was a mixed-effects logistic regression analysis, designed to investigate which factors influenced the prothesis rate. This analysis was partly intended to validate the method, since we expected to replicate and extend previous findings (e.g. the effect of environment). The second analysis used Wilcoxon ranked sum tests to investigate the phonetic hypothesis that [s] would have shorter, coda-like durations when prothesis occurred, and longer, onset-like durations otherwise. All statistics were computed using R (Urbanek and Iacus, 2009).

Prothesis rate by cluster

by task and environment

■ spoken-1

□ spoken-2

□ read-1

□ read-2

i i i i i i

sp sm st sn sl sk

V#sC C#sC

Figure 4. Prothesis rate (a) by cluster and (b) by environment and task. Error bars represent mean standard error.

3.4.1 Statistical analysis of factors affecting prothesis rate

Logistic regression is appropriate for cases in which there are a large number of trials with the same binary outcome (in this case, prothesis or no prothesis). The coefficients of a logistic regression model express the (change in) log-odds of a 'success' (prothesis) as a consequence of the factor's value. Mixed-effects models offer increased flexibility in terms of random effect structure, in particular the ability to explicitly model subject and item effects using random slopes and coefficients (Jaeger, 2008). Thus, mixed-effects logistic regression is an excellent tool for analyzing sC prothesis in our dataset.

The baseline model (AIC 978, BIC 1056, log-likelihood -473)3 included modality (read vs. spoken), environment (isolation, postvocalic, postconsonantal), place (labial, coronal, dorsal), sonority (obstruent, nasal, liquid), and repetition (1,2) as fixed effects (the underlined value represents the contrast-coded baseline), with a random coefficient for each distinct lexical target, and random slopes and coefficients for participant by environment. The fixed effects output summary from the baseline mode is pasted below:

3 The log-likelihood represents the (log of the) product of the probabilities that the statistical model assigns to each individual observation. For example, if the statistical model assigned a flat probability of 0.6 to prothesis and 0.4 to no prothesis, then the log-likelihood of a data set with 3 prothesis tokens and 2 no prothesis tokens would be In ((0.6)3-(0.4)2). Thus, log probability is the (log of the) probability of the data given the model, In Pr(data I model). Log-likelihood is always negative, and becomes more negative with more data points. But for the same data set, a larger (less negative) value represents a better fit. All other things being equal, a model with more free parameters will be better able to fit the same data. The AIC and BIC are derived from the log-likelihood, but include a penalty for model complexity. Abstractly, they are both of the form XIC = -2-(ln Pr(data I model) + penalty), where penalty may be interpreted as a 'prior' that assigns lower probabilities to models with more free parameters. Because of the negative sign, a lower AIC/BIC represents a 'better' model overall (i.e. complexity-penalized model fit). AIC differs from BIC in that AIC assigns a flat penalty that depends on the number of free parameters, whereas the penalty assigned by the BIC scales with the number of data points. In effect, free parameters can 'pay their way' under the AIC by explaining a fixed amount of variance; free parameters have to 'pay their way' under the BIC by explaining a fixed amount of variance per datum. Thus, AIC tends to be more conservative for 'small' datasets, while BIC tends to be more conservative for 'large' datasets.

Table 5. Mixed-effects logistic regression.

Estimate Std. Error z value Pr(>|z|) Sk.

(Intercept) -1.50168 0.60760 -2.471 0.0135 *

modality: spoken -1.36451 0.17342 -7.868 3.59e-15 ***

environment: iso- -0.61657 0.63174 -0.976 0.3291


environment: post- 1.02205 0.48700 2.099 0.0358 *


place: LAB -0.35315 0.27881 -1.267 0.2053

place: DOR 0.73945 0.36506 2.026 0.0428 *

sonority: obstruent -0.13078 0.27866 -0.469 0.6388

sonority: liquid -0.02812 0.36405 -0.077 0.9384

repetition 0.94068 0.16898 5.567 2.59e-08 ***

As expected, the baseline model yielded an effect of environment, with prothesis being e102 = 2.78 times as likely to occur postconsonantally as postvocalically, all other things being equal (prothesis was numerically about half as likely to occur in isolated words, relative to postvocalically, but this difference did not reach significance; it is evident from Table 4 that there is considerable variability in the relative rate of prothesis in isolation). Interestingly, there were strong effects of modality and repetition, with prothesis being almost four times less likely when speakers were repeating after a spoken prompt, and about 2.5 times as likely to prothesize on the second repetition as on the initial production. No sonority effect was observed; the coefficient for liquids was numerically negative (less prothesis than in nasals, expected), but insignificantly so; the obstruent coefficient's sign was numerically negative (contra prediction), but also insignificantly so. Unexpectedly, there was a significant effect of place: the dorsal cluster [sk] triggered higher rates of prothesis than coronal clusters (and by extension, labial ones); there is a numerical trend for prothesis rate to increase as the place of articulation becomes more posterior, but the labial vs. coronal comparison did not reach significance.

The use of place and sonority as fixed effects with no interaction was intended to isolate sonority from the nuisance variable place (no interaction term was used because there is only one place with a liquid); the place effect was unexpected. It is possible, however, that this factorization was inappropriate, and that it would be better to model each cluster individually. To test for this possibility, we did a minimal variant of the baseline model in which the place and sonority factors were replaced with a single fixed effect cluster (baseline [st]). Since this model is not nested with respect to the baseline model, it is not appropriate to perform direct significance testing. However, indirect evidence may be gleaned by comparing the model fit statistics (AIC 979.9, BIC 1063, log-likelihood -473). These statistics show that factorization into place and sonority yields an equivalent fit to the data (same log-likelihoods), but a simpler model overall (BIC is 10631056 = 7 lower, implying a complexity-penalized likelihood ratio of e(1063-1056)/2 = 33.1; BIC differences above 4.4 are considered significant and differences above 10 are considered very strong evidence for one model over the other; Raftery, 1995). Thus, factorization into place and sonority did not inappropriately distort the statistical model; if anything it resulted in a better statistical model of the results.

Since a place effect has not been reported before, despite chances for it to emerge (Carlisle, 1991a; Rauber & Baptista, 2004), a followup analysis was conducted. Close inspection of the stimuli showed that the [sk] targets had an unusually high proportion of compounds in the sentence items (see Appendix); naturally these lexical targets were also comparatively longer phonologically than the other sentential targets. Either of these lexical factors could have been responsible for the higher rate of [sk] prothesis, rather than the dorsal place of articulation per se. Thus, two alternatives to the baseline model were considered, in which the binary fixed effect compound and the integer-valued fixed effect phonemes (for the number of phonemes in the target) were added, respectively. The compound model (AIC 958.34, BIC 1041.1, log-likelihood -462.17) resulted in a significant improvement over the baseline model (x2=21.615, df=1, p<1e-5), as did the phoneme model (AIC 961.42, BIC 1044.2, log-likelihood -463.71; x2=18.534, df=1, p<1e-4), however a final

alternative containing both fixed effects (AIC 959.84, BIC 1047.4, log-likelihood -461.92) was not better than either fixed effect alone. Moreover, while the compound model was significantly better than the baseline model, the compound factor itself was intriguingly only marginally significant, despite having a numerically large value (0.8994, p = .0576); an analogous effect was found for the phonemes model. The combination of high coefficient and lack of statistical significance may be tentatively interpreted as suggesting that compound status does matter considerably, but the result did not reach significance since there were only a small number of compounds in the whole dataset. As for the place effect, in both the compound and phoneme models, the dorsal coefficient dropped below significance although the coefficient value remained numerically non-negligible (compound coefficient: 0.4558, p = 0.2657; phoneme coefficient: 0.66884, p = 0.1111). Given that previous studies on this topic have not found a place effect, the most likely interpretation is that there is a compound effect which explains away the 'extra' prothesis for [sk]; however these data do not rule out the interpretation that there is a genuine place effect that is masked here by a combination of data sparsity and the compound factor. This issue might be avoided in future work by controlling the prosodic shape and morphological status of the sC target items; alternatively, the putative compound effect might make for a worthy follow-up.

3.4.2 Durational characteristics of [s]

Figure 5 shows four histograms of the duration of [s], with milliseconds as the x-axis and raw frequency on the y-axis. The top row shows the distribution for coda [s] and onset [s] in the English stimuli for this study. The bottom row shows the distribution for [s] when prothesis has occurred (prediction: coda-like) and when prothesis has not occurred (prediction: onset-like).

NS - coda [s] NS - onset [s]

The giant gun shoot[s] tanks


DO 150 200 250 300

97 ms.

He bought [s]tamps

100 I 15


150 200 250 300 123 ms.

NNS - epenthesis

He bought (e)[s]tamps


00 150 200 250 300 92 ms.

NNS - no epenthesis

He bought [s]tamps


50 100 121

Г I-1-1-1

150 200 250 300 ms.

Figure 5. Histogram of native speaker's [s] duration distributions in coda and onset position and nonnative speakers' [s] duration distributions with and without prothesis.

The pattern of means and significant differences is illustrated below, with means in the labeled rows and columns, and intervening comparisons (Wilcoxon rank sum tests) in unlabeled rows/columns.

Coda/prothesis Onset/No prothesis

Native English 97 ms * (p<.01) 123 ms

ns (p=.234) ns (p=.533)

SpEns 92 ms * (p<.01) 121 ms

Two crucial generalizations emerge from these comparisons. First, when SpEns prothesize before sC, the [s] duration is not distinct from the duration of a native English speaker's [s] in coda. Second, when non-native speakers do not prothesize before sC, the [s] duration is not distinct from the duration of a native English speaker's [s] in onset. (The absence of significant differences in these two comparisons are meaningful, because the native English speaker does exhibit the expected onset~coda asymmetry, and the non-native speakers do exhibit a prothesis~no prothesis contrast.) In short, when SpEns prothesize in sC contexts, their [s] has phonetic characteristics associated with native English coda [s]; when they do not, it has phonetic characteristics associated with native onset [s] in the sC contexts. This is exactly what the resyllabification account predicts.

3.5 Discussion

To summarize, the data exhibited a significant effect of preceding environment (prothesis: C#sC >> V#sC), consistent with all previous studies which have looked for this phenomenon. There was no evidence of a sonority effect in the rate of prothesis. However, there was an unexpected numerical trend toward a place effect (prothesis: [s]-dorsal > [s]-coronal > [s]-labial); the significance of the dorsal/coronal comparison was eliminated once the confounding factor of compound status was incorporated in the model, so the numerical place differences may represent a statistical fluke. Of greatest import for the experimental hypotheses being tested, the phonetic predictions of the resyllabification account were borne out: [s] was longer in non-prothetic productions, and shorter in prothetic populations, matching the native-speaker distributions exactly for onset and coda duration distributions, respectively. The effects of preceding environment and the [s] duration data are exactly as predicted by the resyllabification account. Of course, these data may be amenable to alternative explanations, an issue which we return to in the General Discussion. In the next section, we turn toward computational modeling of these experimental results, and ask to what extent linguistic theory can provide a quantitative and explanatory account of sC prothesis as language transfer.

4 Modeling sC prothesis as language transfer

This section describes two simulations designed to test the hypothesis that sC prothesis can be modeled by a grammar that is 'between' the native/Spanish grammar and the target/English grammar. Following Broselow, Chen, & Wang (1998), the section begins by developing an OT analysis whose factorial typology includes both the English-like and Spanish-like pattern of repairs for onset clusters. The OT analysis is then extended by embedding it in the MaxEntHG framework (Goldwater & Johnson, 2003; et seq.). MaxEntHG is a probabilistic variant of OT with weighted constraints. Because it is probabilistic, it explicitly predicts the kind of variation that is observed in SpEn productions. And because the framework uses weighted constraints, there is a set of principled and well-defined tests for the claim that the SpEn grammar is 'between' the Spanish and English grammars.

The modeling in this section seeks to account for the categorical mappings of sC onsets in English (faithful) versus Spanish (prothesis), as well as for the variation observed in SpEns. However, the modeling is actually slightly more ambitious than this, because it deals with three different classes of onsets: obstruent-liquids (OL, licit in both English and Spanish), sC (licit in English, illicit in Spanish), and TT (non-

continuant-noncontinuant onsets, illicit in both English and Spanish). The patterns of repair we assume for these items is illustrated below:

(7) Language OL sC TT

_e.g. /pla/_e.g. /sta/_e.g. /kta/_

English faithfulness faithfulness anaptyxis

/pla/^[pla] /sta/^[sta] /kta/^[keta]"

Spanish faithfulness prothesis anaptyxis

/pla/^[pla] /sta/^[sta] /kta/^[keta]

The other onset types are included so that the resulting modeling is more responsible to typological data. As noted earlier, Fleischhacker's (2001, 2005) typological survey finds that anaptyxis is the 'default' repair for onset clusters that are illicit in the borrowing language; prothesis is only observed for sC onsets.5 The modeling that we report here is much more convincing if it can account not only for sC prothesis in SpEns, but also the preference for anaptyxis in other illicit clusters.

4.1 The constraints

The constraints used in the analysis are collected here for the reader's reference. The motivation for each constraint is discussed in the next three sections, where the analysis proper is developed.

(8) constraints used in computational modeling

• *#sC -- violated by words which begin with [s] followed by any [+cons] segment (not glides)

• *#TT -- violated by words beginning with a sequence of two noncontinuants (sp, pl; *kt, *kn)

• dep/C_C -- violated when a word's SR contains a vowel in a C_C context not present in the UR

• dep/s_C -- violated when the epenthetic vowel is after [s]/[(] and before a consonant

• dep/#_C -- violated when the epenthetic vowel is word-initial before a consonant

• max -- violated when a word contains a segment in the UR with no correspondent in the SR

• ons -- violated when a syllable does not have an onset consonant

• *cod -- violated when a syllable has a coda consonant

• al-l -- violated if the first pronounced consonant of a word is not syllable-initial

• al-r -- violated if the last pronounced consonant of a word is not syllable-final

• *CC# -- violated by word-final consonant clusters

• WdByWd -- violated when a syllable contains segments from two different words

4.2 OT analysis: Spanish loanword repairs and native resyllabification

This subsection develops an analysis of Spanish which includes the repairs applied to loanword with illicit onsets. We explicitly adopt the perspective of Boersma & Hamann (2009) that loanword adaptation results

4 For an item like /kta/, the pronounced English form would presumably have a schwa rather than an [e], and might or might not have aspiration on the initial [k], and would certainly have aspiration on the prevocalic [t]. We abstract away from this kind of phonetic/allophonic detail, since for this paper the focus is on how to predict the general type of repair, rather than associated allophonic detail.

5 Fleischhacker (2001, 2005) and Broselow (2015) do not refer specifically to sC onsets, but more generally to fricative-consonant onsets. However, all of the specific prothesis examples they cite are sC onsets. Therefore, we assume prothesis is restricted to sC onsets for the purposes of this paper. The constraint set could easily be adjusted to allow for prothesis in fricative-consonant onsets, if such turn out to be attested.

from the application of the native grammar (in this case, Spanish) to lexical representations containing phonotactically illicit sequences (here, sC and TT onsets). As per Fleischhacker (2005) and Broselow (2015), the fundamental fact to be captured is that for these configurations, the Spanish grammar maps /sta/^[es. ta] and /kta/^[ke.ta] even though the language lacks lexical representations of either form. In addition, the analysis should capture the optionality of conversational resyllabification.

As for sC prothesis, the fundamental markedness constraint at play is *#sC.6 There are various ways to repair the *#sC violation; the candidates we consider are given below, followed by a name that indicates the nature of the repair. We begin with the isolation/phrase-initial context:

(9) candidates for /sta/

a. [sta] -- faithful

b. [es.ta] -- prothesis

c. [se.ta] -- anaptyxis

d. [ta] -- deletion

Both the prothetic (9b) and anaptyktic (9c) candidates include an epenthetic vowel, which is punished by dep. However, dep does not distinguish the two options. To distinguish them, we replace dep with the constraints proposed by Fleischhacker (2001, 2005): dep/s_C and dep/#_C. The former constraint is supposed to encode that there is a particularly large perceptual distortion when a vowel is inserted between a sibilant fricative and a following consonant, while the latter encodes the perceptual distortion that occurs when a vowel is inserted word-initially. These constraints only address prothesis versus faithfulness in sC onsets. However as shown in (7), the full Spanish pattern includes prothesis in sC onsets, anaptyxis in other phonotactically illicit onset clusters, and faithfulness in licit onsets. Since both Spanish and English allow stop-liquid onsets, and both disallow {stop,nasal}-{stop,nasal} onsets, we assume the relevant constraint bans words that begin with a noncontinuant-noncontinuant sequence, abbreviated *#TT. We use /kta/ to stand in for the class of URs whose faithful realizations violate this constraint. The candidates for /kta/ are entirely parallel to /sta/:

(10) candidates for /kta/

a. [kta] -- faithful

b. [ek.ta] -- prothesis

c. [ke.ta] -- anaptyxis

d. [ta] -- deletion

Here, it is sufficient to include one additional faithfulness constraint, dep/C_C, which punishes anaptyxis for every consonant cluster (including sC clusters). To anticipate briefly, we will assume the fixed ranking dep/s_C >> dep/#_C >> dep/C_C for Spanish and English, and demonstrate how it accounts for the variation between Spanish and English. The non-occurrence of the deletion candidate can be explained by undominated max.

Finally, it is worth considering several additional constraints that appear to be relevant, even though they are violated by winning candidates in Spanish or English, or else freely violated in the language's phonotactics more generally. As already noted in the introduction, the native Spanish resyllabification process is amenable to an analysis in terms of ons and *cod (violated by syllables which lack an onset consonant, and possess a coda consonant, respectively). As resyllabification is not automatic in this case, it must be opposed by some kind of pressure for prosody to align with morphology. We will assume the relevant constraints punish candidates in which the initial (final) pronounced consonant of a word is not initial (final) in a syllable, abbreviated Al-L (Al-R).

6 It would be possible to distinguish varying degrees of markedness by splitting the constraint, e.g. *#s[+cons], *#s[-approx], *#s[-son], but we dispense with this step, since it is not justified by our experimental data.

The Spanish pattern is illustrated in (11)-(14):

(11) Spanish constraint ranking

Max, *#TT, *#sC, dep/s_C >> dep/#_C >> dep/C_C >> ons, *cod, Al-L, Al-R

(12) Prothesis for #sC

/sta/ max *#TT *#sC DEP/s_C DEP/#_C DEP/C_C ONS *COD Al-L Al-R

[sta] *!

► [es.ta] * * * *

[se.ta] *! *

[ta] *!

(13) Anaptyxis for #TT

/kta/ Max *#TT *#sC DEP/s_C DEP/#_C DEP/C_C ONS *COD Al-L Al-R

[kta] *!

[ek.ta] *! * * *

► [ke.ta] *

[ta] *!

(14) Resyllabification in conversational speech

/las#as/ Max *#TT *#sC DEP/s_C DEP/#_C DEP/C_C ONS *COD Al-L Al-R

► [las#as] * *

► [la.s'as] * *

[la.s'a] *!

[las#a] *!

[la#a] *i*

Tableaux (12) and (13) capture the essential mappings /sta/^[es.ta] and /kta/^[ke.taj. In both cases there is an undominated markedness constraint banning the (faithful) onset cluster, and undominated max bans deletion. The ranking dep/s_C >> dep/#_C ensures that prothesis is preferred to anaptyxis for sC, while the ranking dep/#_C >> dep/C_C yields the opposite preference for other illicit clusters. The remaining constraints (other than Al-R) must be inactive in (12, 13), since all of them are violated by the winning prothetic candidate in (12). Finally, the native conversational resyllabification process is illustrated in (14), where both options incur violations of bottom-ranked syllable well-formedness or prosody/morphology alignment constraints. We indicate that variation is possible in this case by using multiple, gray ► symbols rather than a single black ► symbol. Note that classical OT does not provide a formal mechanism to encode the rate-of-speech effect, whereby faithfulness gives way to markedness at faster rates of speech and/or more informal speech registers. Thus, in (14) we simply indicate that variation is possible and assign all relevant constraints to the same bottom tier.

4.3 OT analysis: English

The English analysis is much the same as the Spanish analysis, except that *#sC is bottom-ranked.

(15) English constraint ranking

Max, *#TT, dep/s_C >> dep/#_C >> dep/C_C >> dep, ons, *cod, Al-L, Al-R >> *#sC

(16) Faithfulness for #sC

/sta/ max *#TT DEP/s_C DEP/#_C DEP/C_C ONS *COD Al-L Al-R *#sC

►[sta] *

[es.ta] *! * * *

[se.ta] *! *

[ta] *!

(17) Anaptyxis for #TT

/kta/ Max *#TT DEP/s_C DEP/#_C DEP/C_C ONS *COD Al-L Al-R *#sC

[kta] *!

[ek.ta] *! * * *

► [ke.ta] *

[ta] *!

(18) Resyllabification in conversational speech?

/las#as/ Max *#TT DEP/s_C DEP/#_C DEP/C_C ONS *COD Al-L Al-R *#sC

► [las#as] * *

► [la.s'as] * *

[la.s'a] *!

[las#a] *!

[la#a] *i*

Thus, English grammar yields the mappings /sta/^[sta] and /kta/^[ke.ta]. The former mapping is driven by the fact that no faithfulness constraint is ranked lower than the relevant markedness constraint *#sC. The latter mapping is identical to Spanish. For completeness and parallelism, we have included the analogous resyllabification tableau for English, where the predictions for coda-to-onset movement are the same as in Spanish.

4.4 OT analysis: SpEns

Comparing the Spanish and English rankings (11, 16), it is evident that the only meaningful difference (as regards the phenomena treated here) is the position of *#sC: it is undominated in Spanish, and dominated in English. The relative rankings of the other active constraints (Max, *#TT, dep/s_C >> dep/#_C >> dep/C_C) are the same. Thus, it is natural to imagine a 'continuum' of rankings in which the position of *#sC is varied between the top and bottom. However, inspection of tableau (16) reveals that there is only one 'step' on this continuum which changes the mapping for /sta/: prothesis will occur whenever *#sC >> dep/#_C, and a faithful mapping will result otherwise. Unfortunately, classical OT does not provide a formal mechanism for precisely encoding this kind of variability in ranking. Nonetheless, the essence of the idea is that the SpEn behavior is captured by exactly this kind of continuum, an intuition that will be formalized shortly with MaxEntHG.

Prior to that, we wish to expand the analysis to incorporate the preceding environment, as (10)-(18) deal only with the phrase-initial environment and coda-to-onset resyllabification. Recall from the Background that the post-consonantal environment involves two additional candidates beyond the simple prothesis candidate: a resyllabification-without-prothesis candidate and a double-resyllabification candidate. The post-vocalic environment involves only one additional candidate beyond the simple prothesis candidate: resyllabification-without-prothesis. These options are repeated in (19) below:

(19) Candidates involving preceding context

postconsonantal (C#sC) repairs: /pak#sta/ ^

a. [pak#sta] the 'faithful' candidate -- no resyllabification, no prothesis

b. [pak#s.ta] the 'resyllabification' candidate -- resyllabification, but no prothesis

c. [pak#es.ta] the 'prothetic' candidate -- prothesis and resyllabification

d. [pa.k#es.ta] the 'double resyllabification' candidate -- like 3c, but with coda-to-onset resyll'n

postvocalic (V#sC) repairs: /pa#sta/ ^

e. [pa#sta] the 'faithful' candidate -- no resyllabification, no prothesis

f. [pa#s.ta] the 'resyllabification' candidate -- resyllabification, but no prothesis

g. [pa#es.ta] the 'prothetic' candidate -- prothesis and resyllabification

Note that in the postconsonantal environment, onset-to-coda resyllabification (19b) yields a complex coda cluster, which never occurs in Spanish. This candidate should therefore be blocked in Spanish by undominated *CC# (which bans consonant clusters word-finally). Note that onset-to-coda resyllabification like in (19b) does not occur in English either, but for a different reason. The constraint *CC# cannot be responsible for blocking it in English, since English allows complex coda clusters as freely as it allows onsetless syllables. However, it is also the case in English that *#sC does not exert any 'pressure' to avoid the faithful candidate (19a). Therefore, to properly handle the environmental effect, it is necessary to incorporate the additional candidates in (19) and incorporate the *CC# constraint. This constraint, like *#sC, must differ between Spanish and English. In Spanish, it must be undominated, while in English, it joins the ranks of inactive constraints like ons and *cod

Now in the postvocalic environment, there is a puzzle. According to the constraints we have considered so far, prothesis (/pa#sta/^[pa#es.ta]) is harmonically bounded by resyllabification-without-epenthesis (/pa#sta/^[pas.ta]). Both candidates satisfy the most important *#sC constraint, but resyl-labification-without-epenthesis does so without vowel epenthesis. In contrast, prothesis has a violation of comparatively high-ranked dep/#_C. Both candidates create a *cod violation, but prothesis creates a gratuitous ons violation. Both candidates create an al-l violation, since the [s] is not a syllable onset in either case, and neither candidate incurs an al-r violation (as the preceding word does not end in a consonant). According to the general logic of constraint-based phonology, resyllabification-without-epenthesis should be universally favored over prothesis. If the competition is really between (19f) and (19g), how can (19g) ever win?

There are two logical possibilities. One is that the analysis is incomplete, and there is some additional constraint which specifically favors prothesis over resyllabification-without-prothesis. The other logical possibility is that candidate (19f) is sometimes unavailable, e.g. owing to the larger prosodic organization of the utterance. For example, if the speaker pauses before the sC target, candidate (19f) will be blocked, because resyllabification is generally blocked across large prosodic boundaries. It turns out there is a formal solution which does not require distinguishing these explanations: we use the constraint WdByWd to punish candidates in which a single syllable shares segmental material from multiple words. This might be construed as a prosody-morphology alignment constraint; alternatively, it could be interpreted in processing terms as indicating that the level of effort involved in upcoming planning. Presumably planning effort is correlated with prosodic boundaries, as speakers are more likely to pause when they are thinking hard, and resyllabification is blocked across prosodic breaks of a sufficient size. Owing to the task and modality effects that were found in the experiment, we favor the latter interpretation, but the really relevant thing for modeling purposes is that this constraint provides a formal solution to the puzzle of how prothesis could ever be selected over resyllabification-with-epenthesis. The final tableaux reflecting preceding environments are shown in (20) and (21); note that these tableaux only list the violations, as the relative prioritization of *CC# and WdByWd varies. Note finally that Al-R assigns the same violations as WdByWd in the C#sC tableau (20), but they are distinct constraints as illustrated in the V#sC tableau (21).

(20) C#sC

/pak#sta/ max *#TT *CC# *#sC DEP/s_C WdByWd DEP/#_C DEP/C_C ONS *COD Al-L Al-R

[pak#sta] *

[pak#s.ta] * * *

[pak#es.ta] * * * *

[pa.k#es.ta] * *

[pak#se.ta] * *

[pak#ta] *

(21) V#sC

/pa#sta/ Max *#TT *CC# *#sC DEP/s_C WdByWd DEP/#_C DEP/C_C ONS *COD Al-L Al-R

[pa#sta] *

[pa's.ta] *

[pa#es.ta] * * * *

[pa#se.ta] * *

[pa#ta] *

Tableaux (10-15, 20, 21) form the heart of the OT analysis for both Spanish and English. A more precise, quantitative account of SpEns will be given using MaxEntHG. We turn now to describing this formalism.

4.5 MaxEntHG: an overview

4.5.1 Formal definition

Maximum Entropy Harmonic Grammar is the natural embedding of constraint-based grammars into the log-linear statistical framework. Like an OT analysis, a MaxEntHG analysis consists of a set of lexical representation inputs {in}. = N, a set of possible pronounced form outputs for each lexical representation i {out.} j n(j), a finite set of constraints {Ck}k1 K that assign a scalar value to every input-output pair, Ck(in,outi7), and associated weights {wk}k1 K. (Note that the terms used here reflect standard linguistic usage. Outside of linguistics, where maximum entropy models were invented, the normal terms are 'contexts' for inputs, 'outcomes' for candidates, and 'feature functions' for constraints.)

The Harmony ('score') of an input-output pair is the weighted sum of its constraint violations. The conditional probability of an output given an input is the likelihood of selecting that output over other outputs, given a particular input; it is proportional to the exponential of the score of the input-output pair. The requirement that the sum of conditional probabilities for an input sum to 1 uniquely determines these conditional probabilities given a set of constraint violations and associated weights. These verbal statements are compactly summarized with equations in (22):

(22) harmony H(in, outy) = lk=1K wk-Ck(in,out )

conditional probability Pr(outiJ | in) = exp(H(in, out ;))/Z(in) partition function Z(in) = j n()) exp(H(in, out,.))

The 'partition function' is a normalizing constant, whose function is to ensure that the sum of all conditional probabilities for a given input adds to 1, i.e. that the grammar assigns a well-defined probability distribution. This is illustrated in a highly simplified tableau (23) illustrating prothesis vs. faithfulness with just two constraints, weighted w*sC = -5, wDep = -4.

(23) /sta/ *#sC w = -5 DEP w = -4 Harmony score conditional probability

a. [sta] * -5-1 + -4-0 = -5 e-5 = 0.007... e-5/(e-5+e-4) = .269.

b. [esta] * -5-0 + -4-1 = -4 e"4 = 0.018... e"4/(e"5+e"4) = .731...

The next subsection explains why MaxEntHG is an appropriate formalism to use for modeling language transfer.

4.5.2 Why use MaxEntHG to model language transfer?

MaxEntHG is a desirable formalism for modeling language transfer effects, and specifically sC prothesis, for several reasons. First, because it is a probabilistic formalism, it includes explicit formal mechanisms designed to model variation, which the experiment and all previous research shows is ubiquitous in sC prothesis in SpEns.

Second, because MaxEntHG uses weighted constraints, there is a well-defined sense in which one grammar can be 'between' two others. MaxEntHG grammars are characterized by a vector of weights w = [wi]i.1 k associated with the constraints C = [Ci]i1 k of the analysis, where each weight wi is a real number encoding the prioritization of the corresponding constraint Ci. A weight vector w = [wi]i=1 k is 'between' two other weight vectors x = [xi]i=1 k and y = [yi]i1 k if and only if xi < wi < yi or yi < wi < xi for all i. Given a grammatical analysis of Spanish and English with associated weight vectors wSp and wEn, it is mathematically straightforward to define whether a SpEn's grammar wSpEn is 'between' wSp and wEn.

A final reason to use MaxEntHG is that it has attractive theoretical properties. One desirable computational property is the guarantee that for a given analysis and dataset there exists a statistically optimal weight vector which can be found efficiently (Berger, Della Pietra, & Della Pietra, 1996; Eisner, 2002). Another desirable property arises from the nature of the linear computation on the constraint violations: MaxEntHG includes the capacity to model cumulativity effects, better known as 'constraint ganging' (for discussion see Pater, to appear). MaxEntHG's treatment of ganging offers the potential to model contextual effects in language transfer, because it predicts that low-priority constraints will have the most visible quantitative effects when there is variation. This point is illustrated schematically in the next subsection.

4.5.3 Constraint ganging

Tableau (24) illustrates how constraint ganging works in MaxEntHG. It repeats tableau (23), except with an additional markedness constraint, ons.

(24) /sta/ *#sC w = -5 DEP w = -4 ONS w = -1 Harmony score conditional probability

a. [sta] * -5-1 + -4-0 + -1-0 = -5 e-5 = 0.007.. e"5/(e"5+e"5) = .5

b. [esta] * * -5-0 + -4-1 + -1-1 = -5 e"5 = 0.007.. e"5/(e"5+e"5) = .5

As shown in (24), the addition of a single relatively low-weighted constraint has shifted the probability distribution considerably: the probability of prothesis was 0.731 in (23), but is only 0.5 in (24). An important but underappreciated point is that this kind of shift is only apparent when there is a close competition between the best candidates. Tableaux (25, 26) repeat (23, 24), except that the weight of the core markedness constraint *#sC has been boosted to -12.

(25) /sta/ *#sC w = -12 DEP w = -4 Harmony score conditional probability

a. [sta] * -12-1 + -4-0 = -12 e"12 = 0.000.. e"12/(e"12+e"4) = .0003

b. [esta] * -12-0 + -4-1 = -4 e"4 = 0.018.. e"12/(e"12+e"4) = .9997

(26) /sta/ *#sC w = -12 DEP w = -4 ONS w = -1 Harmony score conditional probability

a. [sta] * -12-1 + -4-0 + -1-0 = -12 e-12 = 0.000.. e"12/(e"12+e"5) = .0009

b. [esta] * * -12-0 + -4-1 + -1-1 = -5 e"5 = 0.007.. e"12/(e"12+e"5) = .9991

As shown in (25), boosting the priority of *#sC essentially eliminates variability -- the faithful candidate's probability has dropped to a level that is not distinct from speech errors (and not likely to be observed in a normal-scale production experiment). In this context, the addition of the same low-priority ons constraint with the same weight only triggers a change in probability of (.9997-.9991) = .006. In short, variability in MaxEntHG occurs when there are two or more candidates that have relatively close Harmony scores; in this case there can be relatively large observable effects of comparatively low-priority constraints owing to constraint ganging, cf. (23, 24). When the Harmony difference between the best candidate and the runner-up is sufficiently large, MaxEntHG yields an essentially deterministic mapping, cf. (25, 26), and in this case low-priority constraints do not have an observable effect.

4.6 MaxEntHG analysis of sC prothesis

Having defined and illustrated all the relevant properties of MaxEntHG for modeling language transfer, this section turns to the actual analysis. The first step in adapting an OT analysis for MaxEntHG is to simply retain the tableau that were already developed, and encode them in a software package that is capable of doing the MaxEntHG calculations. For that purpose, this paper uses PhoMEnt, an open-source Python implementation that is freely available on GitHub ( Ent).

As it turns out, the tableaux that are included in a standard OT analysis are necessary but not sufficient for a MaxEntHG analysis. This is because in standard OT analyses, the analyst may simply stipulate that certain constraints are undominated based on background facts of the language which are uncontroversial. For example, in the OT analysis of Spanish presented above, it was simply stipulated that *CC# was undominated. This stipulation is eminently sensible, as Spanish completely lacks word-final consonant clusters; according to general logic of OT, this fact is fully explained exactly by having *CC# be undominated. Therefore, we did not include any tableaux earlier which illustrated the false predictions that would ensue if *CC# were not undominated. In MaxEntHG, where constraint weights are set so as to maximize the probability of the data, a constraint is not allocated any weight unless it is necessary to do so in order to avoid selecting losing candidates. Tableau (20) is the primary place where this matters for *CC#, and from inspection of the violation profile, it is evident that loser [pak#s.ta] could also be avoided by assigning a high weight to WdByWd or al-r. Therefore, tableau (20) alone does not compel the learner to assign a high priority to *CC#. Moreover, *CC# is not violated in any other tableaux we have presented so far. In other words, our analysis has yet to explicitly encode the background knowledge that *CC# must be highly prioritized in Spanish. Some 'extra' tableaux must be added to represent this kind of background knowledge.

To handle *CC#, we included a tableau for /apt/ with a faithful candidate [apt] and a deletion candidate [ap]. In the Spanish input file, the frequency of the faithful candidate [apt] was set to 0, and the frequency of the deletion candidate [ap] was set to 1000. The main purpose of this was to provide the learner with evidence that final coda clusters are dispreferred. A secondary purpose was to indicate our belief (based on anecdotal observations) that deletion is the most likely repair that SpEns would apply when producing English words with underlying coda clusters, such as /apt/.

Besides the prioritization on *CC#, it is also necessary to give the learner evidence that gratuitous anap-tyxis is dispreferred. To do this, a tableau was included with an underlying onset cluster that is legal in both

English and Spanish, /pla/. To parallel the other crucial tableaux, four candidates were included: faithful [pla], prothetic [epla], anaptyktic [pela], and deletion [pa]; in both Spanish and English, the frequencies were set to 1000 for [pla] and 0 for other candidates. The /pla/ tableau ([pla] vs. [pela]) closely resembles the /kta/ tableaux, except that with /kta/, the faithful candidate violates *#TT, where anaptyxis is motivated. Thus, the /kta/ tableau teaches the learner that anaptyxis is tolerated to resolve a *#TT violation (w*#TT >> wDEp/C C), while the /pla/ tableau teaches the learner that gratuitous anaptyxis is dispreferred (wDEp/C C >> 0). Besides these tableaux, a /sta/ tableau was included with faithful, prothetic, anaptyktic, and deletion candidates (Spanish: 100 prothetic; English: 100 faithful). This tableau contained information about the relative prioritization of *#sC, dep/s_C, and dep/C_C.

The crucial data to be modeled are the relative counts of prothetic vs. faithful realizations, which were depicted graphically in Fig. 4b. The raw numbers are given in (27) as unnormalized odds:

(27) prothesis:faithful Read-1 Read-2 Spoken-1 Spoken-2

C#sC 43:23 49:19 18:46 20:31

V#sC 26:40 36:33 13:55 16:36

Note that ambiguity arises with respect to candidates that are string-equivalent but differ in syllabification. For example, the UR /pa#sta/ has a faithful candidate [pa#sta] and a resyllabification candidate [pa#s.ta]. These forms are string-equivalent in that they consist of the same string of segments in the same order, and differ only in whether the [s] is syllabified with the preceding or the following vowel. Since we did not distinguish these experimentally (this would have been assuming the answer to a hypothesis the experiment was testing), these outputs are ambiguous between the two parses. For modeling purposes, the observed counts of V#sC non-prothesis were evenly divided between these two candidates. For example, there were 40 tokens without prothesis in the first repetition of the READ SENTENCES condition (Read-1 above); these were evenly split so that the 'observed' frequency of [pa#sta] was 20 and the 'observed' frequency of [pa#s.ta] was 20. There are two more pairs that are similarly ambiguous for the UR /pak#sta/. One is the prothesis candidates, [pak#es.ta] and [pa.k#es.ta]. The observed counts from the experiment were similarly split between these candidates for each modality/repetition. The other ambiguous pair was [pak#sta] and [pak#s.ta]; in this case the observed counts were assigned entirely to the faithful candidate, because preliminary modeling indicated that the candidate with a coda cluster was assigned negligible probability. The complete input files used are contained in the Supplementary Materials.

4.7 Simulation 1: Direct fits

Simulation 1 consisted of directly fitting constraint weights so as to maximize posterior probability of the data. That is, this simulation asks, 'What is the best grammar for generating the SpEn data?' If it turns out that the best grammar for generating the SpEn data is strictly 'between' the Spanish and English grammars (in the sense defined earlier), this constitutes powerful support for the idea that language transfer can be modeled in MaxEntHG as done here. Note however that Simulation 1 does not directly test the hypothesis that language transfer can be modeled as linear interpolation, which is the most conservative interpretation as to how one grammar can be 'between' two others. It is logically possible that the crucial aspects of the SpEn data (relative rates of prothesis in C#sC and V#sC) could be generated by weight vectors that are intermediate between Spanish and English even if the statistically optimal weight vector is not intermediate. Therefore, Simulation 1 tests a more liberal meaning of 'between', and Simulation 2 tests the more conservative proposal of linear interpolation.

The prior for PhoMEnt consists of L1 and L2 regularization, i.e. a penalty that scales with the sum of the weights and the sum of the squares of the weights, respectively. The regularization coefficients were Aj = .001 = A2, which is fairly small in comparison to the 50 or more critical data points per input file. The resulting weights are reported in (28):

(28) Spanish Read-1 Read-2 Spoken-1 Spoken-2 English

Max -21.88 -21.33 -21.73 -20.99 -21.11 -17.27

*#TT -16.41 -16.56 -16.59 -16.59 -16.6 -17.22

*CC# -31.54 -21.32 -21.73 -20.98 -21.1 -1.1

*#sC -22.73 -15.13 -15.47 -14.12 -14.58 0

DEP/s_C -14.75 -13.84 -14.27 -13.48 -13.61 -2.83

WdByWd -3.7 -4.23 -4.34 -4.35 -4.38 -1.33

DEP/#_C -5.01 -4.12 -3.82 -4.96 -4.61 -4.35

DEP/C_C -8.42 -8.59 -8.6 -8.61 -8.62 -9.28

ons -3.26 -2.6 -2.58 -2.77 -2.68 -4

*cod -2.69 -4.19 -4.41 -3.51 -3.84 -3.91

Al-L -3.7 -4.12 -4.24 -3.83 -3.95 -6.28

Al-R -2.24 -2.67 -2.75 -2.46 -2.57 -1.64

Quick inspection of (28) reveals that most of the optimal weights for the SpEn data (Read-1, etc...) are intermediate between the Spanish and English values. For example, the fourth row represents the key markedness constraint *#sC. As expected, this constraint is highly prioritized in Spanish (w=-22.76) and given no priority in English (w=0); in the SpEn data this constraint is weighted around w = -14.5, which is truly intermediate between the Spanish and English values. This type of value is exactly what is expected if the SpEn grammar interpolates between the Spanish and English grammars. There is however another case: SpEn weights that are technically intermediate between Spanish and English, but hover very close to one end or the other end. For example, dep/s_C has a weight of -14.75 in the Spanish grammar, hovers around -14 in the SpEn grammars, but has the value -2.83 in the English grammar. Obviously the SpEn values are almost at the Spanish 'end' of the Spanish-English range. All of the constraints that have a medium or high priority in Spanish or English (i.e. a weight with a magnitude of 5 or greater in either language) are of these two types, i.e. 'between' the Spanish and English weights.

Finally, however, there are constraints whose SpEn weights fall slightly outside the Spanish-English range. For example, al-r has the weight -2.67 in the Read-1 condition, which is outside the range of -2.24 (Spanish) to -1.64 (English). These are not highly concerning, since most of the relevant constraints are low-priority in both languages, and the SpEn weight does not lie more than 1 away from the Spanish-English range. The constraints of this type are ons, *cod, al-r, and WdByWd; these share the properties that they (i) govern prosody/morphology alignment, and (ii) are freely violated in both Spanish and English.

A subset of the PhoMEnt output file for the Read-1 condition is given in (29), so that the reader may

evaluate the quality of the modeling predictions.

(29) UR | SR |observed | expected | harmony | probability

/pla/ [pla] 1000 999.8 0 .9998

[] 0.2 -8.61 .0002

[] -15.05 --

[pa] -21.36 --

/sta/ [sta] -15.16 .4729

[se.ta] -22.47 .0003

[es.ta] -15.05 .5258

[ta] -21.36 .0010

/kta/ [kta] -16.60 .0003

[ke.ta] 100 99.8 -8.61 .9981

[ek.ta] 0.2 -15.05 .0016

[ta] -21.36 --

/pa#sta/ [pa#sta] 20 21.8 -15.16 .3296

[pa's.ta] 20 20 -15.24 .3031

[pa#se.ta] -22.47 .0002

[pa#es.ta] 26 24.2 -15.05 .3665

[pa#ta] -21.36 .0007

/pak#sta/ [pak#sta] 23 21.3 -19.33 .3224

[pak's.ta] -36.59 --

[pak#se.ta] -26.64 .0002

[pak#es.ta] 22 23.7 -19.22 .3585

[pa.k"es.ta] 21 21 -19.34 .3183

[pak#ta] -25.52 .0007

Note that the model perfectly captures the categorical mappings, but also yields a good fit on items with variability. The observed odds of prothesis in the V#sC context are 26:40, while the model predicts 24.2:41.8. The observed odds of prothesis in the C#sC context are 43:23, while the model predicts 44.7:21.3. The model is therefore capturing the degree of asymmetry in the preference for prothesis in C#sC versus V#sC contexts, although it actually predicts a modestly higher asymmetry than is observed.

In summary, Simulation 1 offers qualified support for the hypothesis that language transfer represents a grammatical state that is intermediate between the native/L1 and target/L2 languages. When the relevant constraints were trained on the observed SpEn data collected in the experiment so as to maximize posterior data likelihood, all of the highly prioritized weights fell in the range of values that was strictly between the analogous values for the more categorical Spanish and English grammars. A mildly troubling finding of this simulation was that the ideal SpEn weights for low-priority constraints that regulate prosody-morphology alignment did not always fall in the English-Spanish range. However, they did not fall far outside the English-Spanish range.

Moreover, it is not clear that Simulation 1 is the best test of the hypothesis that language transfer can be modeled in MaxEntHG by grammars that are 'between' the native and target. Simulation 1 asked what is the best weight vector to generate the observed data, rather than whether a weight vector that is strictly between the Spanish and English weights can generate the data. The mathematical term meaning 'strictly between' two vectors is formalized by the idea of linear interpolation, and this is addressed in Simulation 2.

4.8 Simulation 2: Interpolation, with a tunable effort parameter

Simulation 1 indicated that the optimal constraint weights for modeling SpEns experimental data largely fell in 'between' the optimal Spanish and English weights. While this result may be interpreted as offering tentative support for the idea of language transfer as 'interpolation', it does not offer a fully predictive theory. The meaning of 'interpolation' within an algebraic system such as the weightspace in MaxEntHG is rather more specific: it is the idea that the intermediate grammar is a weighted average of the two endpoint grammars, as shown in (30).

(30) wSpEn = a-wEn + (1-a)-wSp

Linear interpolation is a considerably more restrictive theory than merely stating that the intermediate grammar will be 'between' the two endpoint grammars, because it defines a single-dimensional line on which the SpEn grammars should fall (in contrast, the idea that weights are simply 'between' the English and Spanish values defines a 12-dimensional 'box').

There is already evidence that a model of SpEn data with a single linear interpolation coefficient is too restrictive to model the existing data. For one thing, the precise rates of prothesis varied across modality and repetition. Another issue pertains to the relation between cognitive effort and speech production. It is generally recognized that non-native language production is more effortful than native language production. The trained weights for the English and Spanish grammars reflect native grammars, while weights for SpEn grammars should in some sense reflect the greater effort involved in their non-native productions. As noted earlier in the phonological analysis, the constraint WdByWd is a low-priority constraint that penalizes prosody/morphology alignment, and might be taken as an index of cognitive planning/effort. Moreover Simulation 1 showed that the 'optimal' weight for this constraint was outside of (and higher than) the English-Spanish range. Therefore, we modify the model in (30) slightly, by allowing wWdByWd to vary while the remaining weights are determined by a. Fig. 6 plots the predicted log-odds of prothesis in V#sC and C#sC context for a variety of values of a and wWdByWd.

log odds of prothesis in C#sC

Figure 6. Predicted log-odds of prothesis in C#sC vs. V#sC contexts as a function of interpolation coefficient and 'effort' constraint weight.

The observed values from the experiment (Read-1, etc..) are plotted as text, while each curve represents a differing value of a (increments of .01). The weight of wwdbywd was varied between 2 and 10; as evident from Fig. 6, the observed values have a comparatively high wwdbywd. The best-matching values are given below:

• Spoten-1 a = °.35, wwdbywd = - 63

• Spoken-2 a = °.32, wwdbyWd = - 67

• Read-1 a = °.28, wwdbywd = - 6.3

• Read-2 a = 0.27, Wwdbywd = - 7.°

According to this model, the modality/repetition effect is best captured by variation in the interpolation coefficient, with Spoken-1 representing the most English-like (the inferred weights are 0.35 of the way toward English from Spanish) and Read-2 representing the least English-like.

4.9 Summary and Discussion

This section investigated the hypothesis that language transfer (sC prothesis in SpEns) can be modeled in MaxEntHG by constraint weight vectors that are 'between' the native/Ll and target/L2 weights. Following Broselow et al. (1998), the initial sections developed a constraint-based analysis of sC prothesis; the analysis was then adapted for MaxEntHG by incorporating additional tableaux to reflect background knowledge of English and Spanish phonotactics. Two simulations were then conducted assessing how well the MaxEntHG analysis explained the observed data collected in the experiment. Simulation 1 used the standard maximum entropy weight-fitting procedure to identify the 'best' weights, given the observed rates of prothesis in the V#sC and C#sC contexts. All of the highly-prioritized constraints in the SpEn grammars were assigned weights that were intermediate between the Spanish and English weights that resulted from the same fitting procedure, although some of the low-priority constraints regulating prosody-morphology alignment fell slightly outside the Spanish-English range. Because there is no guarantee that non-native speakers are using an 'optimal' grammar for the speech that they produce, a followup was conducted. Simulation 2 tested the more restrictive hypothesis that SpEns' relative rates of prothesis in the experiment

could be generated by linear interpolation between the Spanish and English weights, with an additional free parameter added to allow for the comparative difficulty of producing speech in a non-native language. The results showed that the modality/repetition variation in the experimental data was modeled fairly well by varying the interpolation coefficient between about 0.27 and 0.35, with the intuitive meaning that the SpEn weight vector falls about 30% of the way along a line drawn from the Spanish weight vector to the English weight vector. These results are generally consistent with the idea that language transfer can be modeled as linear interpolation in MaxEntHG, once provision is made for the additional effort involved in non-native language production.

5 General Discussion

This paper has reported a production study with non-native speakers of English whose first language is Mexican Spanish (SpEns), as well as computational modeling of the production data. The phenomena under investigation was sC prothesis, whereby speakers insert a prothetic vowel when attempting to produce an English word that begins with an [s]-consonant onset cluster, e.g. school ^ (e)school. Carlisle (1991a) proposed the resyllabification account of sC prothesis, whereby SpEns prothesize in order to yield a syllabification that satisfies Spanish phonotactics, accounting for the phenomenon as some kind of language transfer. The present paper extends and tests that proposal along several dimensions. In the production study, 4 SpEns produced sC-initial words in isolation and sentence-finally. The results were analyzed categorically (prothesis occurred or not), and acoustic measurements were made as well. Prothesis was found to occur more frequently post-consonantally than post-vocalically (C#sC > V#sC), an effect that was expected both by the resyllabification account and by previous work. A previously untested prediction of the resyllabificaiton account was that the [s] would have coda characteristics when prothesis occurs, but onset characteristics when it does not. This prediction was fully confirmed. The computational modeling section addressed the predictive value of the idea that the SpEn grammar is in some sense 'intermediate' between the L1/Spanish and L2/English grammars using the constraint-based phonological framework known as MaxEntHG. To test this idea properly, it was necessary to develop a phonological analysis of sC prothesis as it interacts with resyllabification, including resyllabification across a word boundary. The constraint set included various constraints from Fleischacker's (2001, 2005) analysis of loanword phonology, and is consistent with the typological fact that anaptyxis is always preferred for non-sC clusters. The two simulations were broadly consistent with the idea that language transfer can be modeled as interpolation between the L1 and L2 grammars, once provision is made for the extra cognitive effort involved in non-native language production. The remainder of the paper addresses various issues and implications of this work.

5.1 Alternative accounts of the duration contrast

In the phonetic analysis of SpEn productions, we found that the durational characteristics of SpEns' [s] were predictable based on whether prothesis had occurred or not: when prothesis occurred, [s] had the same duration distribution as the native speaker's coda distribution, but when prothesis did not occur, [s] had a duration distribution matching the native speaker's onset cluster distribution. This is predicted by the resyllabification account (provided that SpEn's are sensitive to allophonic variation in non-native phonemes). At the very least, this test provided a chance for the resyllabification account to be falsified, and it was not falsified. In that sense, it is fair to say that the results support or are at least consistent with the resyllabification account.

However, it is entirely possible that alternative theories (i.e. ones which do not require reference to resyllabification of [s] in sC prothesis) might explain the same data. The simplest imaginable theory is that segmental duration is purely influenced by the number of segments in some kind of larger metrical unit (e.g. the foot, or prosodic word). Under this account, the reason the [s] is shorter when prothesis occurs is because there are more segments, and the [s]'s duration must be shortened to accommodate the extra time

taken up by the prothetic [e]. This account -- or at least the most extreme version of it which does not make reference to syllabification at all -- does not straightforwardly explain why the duration of [s] varies even in native speakers as a function of putative syllable position (Redford, 2004; also our own experimental data). It is also logically possible that that the duration of [s] does vary systematically according to some other contextual properties (word position, neighboring segments) in a way that does not require reference to (re) syllabification, and is amenable to a more sophisticated version of the compression account. A reviewer points out "that any serious investigation of a possible compression effect must measure the durations of ALL the relevant segments, not only one. Compression would presumably affect all segments within a compression domain, with the possibility of tradeoffs in duration across segments."

We agree that the durational analysis presented in the experiment does not necessarily rule out alternative hypotheses to the resyllabification account. Our aim in this paper was not to compare the resyllabi-fication account against other hypotheses, but simply to test whether some of its core predictions could be directly falsified. They were not falsified, which means the resyllabification is able to explain the data we have seen. Of course the possibility remains that other hypotheses are also able to explain the same data, and possibly to give a better explanation; we leave this possibility to future research.

5.2 Some thoughts on the modality and repetition effects

In the discussion of the experiment section, we briefly discussed the variation we observed in the prothesis rate between spoken and read tasks, as well as the repetition effect. The basic finding was that the prothesis rate was higher across the board when the stimuli were read than when they were spoken aloud, and the prothesis rate was higher when on the second production than on the initial production. As the reading aloud task involves orthographic decoding, we speculated that it is more cognitively effortful than the speaking aloud task. That is, while both tasks require phonological planning, there are less resources available for phonological planning in the reading aloud task, because more resources are required to process non-native orthography. With less resources allocated to phonological planning, a SpEn is more likely to rely on the Ll/Spanish grammar rather than realizing the L2/English mapping correctly.

An alternative but not incompatible perspective is offered by several reviewers, highlighting the role of orthography in priming phonological representations. Hallé et al. (2008) argue that orthographic sC forms are phonologically repaired to the corresponding esC forms by Spanish speakers during native reading. This claim is motivated by a priming asymmetry in a lexical decision task. There is a larger priming effect for sC-esC pairs (e.g. stación-estación) than for other pairs that share the same level of orthographic overlap (e.g. stuto-astuto). The effect is only observed when the stimulus onsest asynchrony is sufficiently large, which the authors interpret as evidence of phonological mediation rather than visual/orthographic mediation. Berent (2008) conducted a series of lexical decision tasks with illegal onset clusters for English listeners/readers, including attested clusters (blif), modestly ill-formed sonority rises (*bnif), more ill-formed sonority plateaus (*bdif), and highly ill-formed sonority falls (*lbif). When the stimuli were presented auditorily, listeners were generally quicker to reject items that were more ill-formed (except that they did not differ on the most ill-formed lbif and most well-formed blif items, which Berent interpreted as evidence of perceptual repair). However, when the stimuli were presented orthographically, there was not a gradient effect of ill-formedness on rejection speed. The Berent data are difficult to accommodate under the hypothesis that orthography especially primes phonological representations (while auditory stimuli especially prime acoustic representations), because in that case the gradient effect of ill-formedness on reaction time is expected to be stronger for orthographic stimuli than auditory stimuli. In summary, it is not entirely clear whether or under what conditions orthographic stimuli that represent phonologically illicit items in the speakers' native languages will or will not activate phonological repairs.

5.3 Language transfer as linear interpolation

The idea that non-native speakers' behavior is in some sense 'intermediate' between the native/L1 and target/L2 grammar is not a new one (e.g. Broselow, Chen, & Wang, 1998). The present study has gone beyond previous experimental work in assessing the phonetic predictions of a particular instance of this theory: the resyllabification account of sC prothesis in SpEns predicts that [s] will have coda characteristics when prothesis occurs, and onset characteristics otherwise, which is what was found for duration.

The present study has extended previous theoretical work on non-native phonology (e.g. Broselow, Chen, & Wang, 1998; Fleischacker, 2001, 2005) by adapting it for Maximum Entropy Harmonic Grammar (MaxEntHG). MaxEntHG differs from Optimality Theory in being stochastic and having weighted constraints rather than constraint ranking. Because it is stochastic, MaxEntHG has the potential to model variation, which is ubiquitous in language transfer effects. As for constraint rankings, there is not a general criterion to decide whether one ranking is 'between' two other ones. However, there are mathematically principled criteria for whether one constraint weight vector w is between two other ones x and y. The weaker sense -- that every wi is numerically between the corresponding xi and yi -- was tested in Simulation 1 by fitting constraint weights to the production data collected in the experiment (as well as to idealized data representing the Spanish and English grammars). All of the high priority constraints in the SpEn data were 'between' the Spanish and English values, although this was not true for the lowest priority constraints regulating prosody/morphology alignment (e.g. ons). The stronger sense is that the value of the intermediate grammar can be computed by a simple weighted average of the native/L1 and target/L2 grammars, formalized here as wSpEn = a - wEn + (1-a)- wSp. In other words, the SpEn data is literally characterized as linear interpolation between the Spanish and English weight vectors. This more restrictive theory was not sufficient to predict the SpEn data on its own, but it made for an excellent fit to the data once provision was made for the extra difficulty of speech production in the non-native language (represented in the modeling section with a higher weight for the 'effort' constraint WdByWd). The modality and repetition effects observed in the experiment (more prothesis for read speech, more prothesis on repetition) were well-modeled by relatively small variation in the interpolation coefficient (a was in the range 0.27-0.35). In other words, the computational modeling suggests the SpEn speakers' grammar is about 30% of the way from Spanish to English (at least in the aggregate; Table 4 shows there is considerable variability across speakers). Therefore, this work has extended the expressive power of phonological theory by numerically modeling the variation that occurs in language transfer. The proposal we offered in Simulation 2 of modeling language transfer as linear interpolation is relatively restrictive, as evident by its ability to explain the small but robust effect of preceding context on prothesis rate (given a suitable phonological analysis). Therefore, while this study is small in scale, the results are fairly promising for the more general proposal to model language transfer as linear interpolation.

Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Acknowledgments: We wish to acknowledge Ellen Broselow, Shigeto Kawahara, and a number of anonymous reviewers for constructive criticism.

Conflict of Interest: None


Abrahamsson, Nicolas. 1999 Vowel epenthesis of /sC(C)/ onsets in Spanish/Swedish interphonology: A longitudinal case

study. Language Learning 49(3), 473-508. Berent, Iris. 2008. Are phonological representations of printed and spoken language isomorphic? Evidence from the

restrictions on unattested onsets. Journal of Experimental Psychology: Human Perception & Performance, 34, 1288-1304.

Berent, Iris, Donca Steriade, Tracy Lennertz, & Vered Vaknin. 2007. What we know about what we have never heard: evidence from perceptual illusion. Cognition 104, 591-630.

Berger, Adam, Stephen Della Pietra & Vincent Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1), 39-71.

Boersma, Paul & Silke Hamann. 2009. Loanword adaptation as first-language phonological perception. In Andrea Calabrese & W. Leo Wetzels (eds.): Loanword Phonology, 11-58. Amsterdam: John Benjamins.

Broselow, Ellen. 2015. The typology of position-quality interactions in loanword vowel insertion. In Y. Hsiao and L-H. Wee (eds.) Capturing Phonological Shades, 292-319. Cambridge Scholars Publishing.

Broselow, Ellen, Su-I Chen, & Chilin Wang. 1998. The emergence of the unmarked in second language phonology. Studies in Second Language Acquisition 20, 261-280.

Canfield, D. Lincoln. 1981. Spanish pronunciation in the Americas. Chicago: University of Chicago Press.

Carlisle, Robert. 1988. The effect of markedness on epenthesis in Spanish/English interlanguage phonology. Issues and Developments in English and Applied Linguistics 3, 15-23.

Carlisle, Robert. 1991a. The influence of environment on vowel epenthesis in Spanish/English interphonology. Applied Linguistics 12(1), 76-95.

Carlisle, Robert. 1991b. The influence of syllable structure universals on the variability of interlanguage phonology. In Volpe AD (ed) The seventeenth LACUS forum 1990. Lake Bluff, IL: Linguistic Association of Canada and the United States, pp. 135-145.

Carlisle, Robert. 2002. Syllable structure universals and second language acquisition. International Journal of English Studies 1(1), 1-19.

Daland, Robert, Bruce Hayes, James White, Marc Garellek, Andrea Davis, & Ingrid Norrmann. 2011. Explaining sonority projection effects. Phonology 28(2), 197-234.

Eisner, Jason. 2002. Parameter estimation for probabilistic finite-state transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. East Stroudsburg, PA: Association for Computational Linguistics. http://

Flege, James E. 1988. Factors affecting degree of perceived foreign accent in English sentences. Journal of the Acoustical Society of America 84(1), 70-79.

Fleischhacker, Heidi. 2001. Cluster-dependent epenthesis asymmetries. In A. Albright and T. Co (eds.), Papers in Phonology 5, UCLA Working Papers in Linguistics, 71-116.

Fleischhacker, Heidi. 2005. Similarity in phonology: Evidence from reduplication and loan adaptation. Ph.D. dissertation, UCLA.

Goldstein, Brian. 2001. Transcription of Spanish and Spanish-influenced English. Communication Disorders Quarterly 23(1), 54-60.

Goldwater, Sharon & Mark Johnson. 2003. Learning OT constraint rankings using a maximum entropy model. In Spenader, Eriksson, & Dahl (eds) Proceedings of the Workshop on Variation within Optimality Theory. Stockholm University, pp. 111-120.

Gouskova, Maria. 2004. Relational hierarchies in OT: the case of Syllable Contact. Phonology 21:2, pp. 201-250.

Hallé, Pierre, Alberto Dominguez, Fernando Cuetos, & Juan Segui. 2008. Phonological mediation in visual masked priming: evidence from phonotactic repair. Journal of Experimental Psychology: Human Perception and Performance 34 (1), 177-192.

Harris, James. 1983. Syllable Structure and Stress in Spanish: A Nonlinear Analysis. Cambridge, MA: The M.I.T. Press.

Harris, James. 1987. Epenthesis Processes in Spanish. In Neidle C and Nuñez Cedeño RA (eds) Studies in Romance Languages. Dordrecht: Foris Publications, pp. 107-122.

Hayes, Bruce & Colin Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39, 379-440.

Hualde, Jose. 2005. The Sounds of Spanish. Cambridge: University Press.

Jaeger, Florian T. (2008) Categorical data analysis: Away from ANOVAs (transformation or not) and

towards logit mixed models. Journal of Memory and Language 59, 434-446.

Macpherson, Ian R. 1975. Spanish Phonology: Descriptive and Historical. Manchester: Manchester University Press.

Mazzoni, Dominic et al. (2007) Audacity (Version 1.2.5) [Computer program]. Retrieved January 24, 2007, from http://audacity.

Núñez Cedeño, Rafael & Alfonso Morales-Front. 1999. Fonología generativa contemporánea de la lengua española. Washington, DC: Georgetown University Press.

Pater, Joe. to appear. Universal Grammar with Weighted Constraints. In John McCarthy and Joe Pater (eds.) Harmonic Grammar and Harmonic Serialism. London: Equinox.

Prince, Alan & Paul Smolensky. 1993/2002/2004. Optimality Theory: Constraint interaction in generative grammar. Blackwell Publishers (2004), Rutgetrs Optimality Archive #537-0802 (2002) Technical Report, Rutgers University Center for Cognitive Science and Computer Science Department, University of Colorado at Boulder (1993).

Raftery, Adrian E. 1995. Bayesian model selection in social research. Sociological Methodology 25, 111-163.

Rauber, Adreia S. & Baptista, Barbara O. 2004. The production of English initial /s/ clusters by Portuguese and Spanish EFL speakers. Revista de Estudos da Linguagem 12, 459-474.

Redford, Melissa. 2004. Origin of consonant duration patterns. In Agwuele A, Warren W and Park S (eds) Proceedings of the 2003 Texas Linguistics Society Conference. Somerville, MA: Cascadilla Procedings Project, pp. 54-61. http://www. TLS 03.pdf

Shaw, Jason & Lisa Davidson. 2012. Sources of illusion in consonant cluster perception. Journal of Phonetics 40(2), 234-248.

Selkirk, Elisabeth. 1984. On the major class features and syllable theory. In Language Sound Structures, eds. Mark Aronoff and R. T. Oehrle, 107-136. Cambridge, Mass.: MIT Press.

Sievers, Eduard. 1881. Grundzüge der Phonetik. Leipzig: Breitkopf & Härtel.

Hayes, Bruce, Robert Kirchner, & Donca Steriade. 2004. Phonetically-based phonology. Cambridge.

Urbanek, Simon & Stefano lacus. 2009. R (Version 2.10.1) [Computer program].

Yavag, Mehmet & Jessica Barlow. 2006. Acquisition of #sC clusters in Spanish-English bilingual children. Journal of Multilingual Communication Disorders 4(3), 182-193.

Appendix A: Experimental Stimuli

A.1 sC sentences C#sC

He is a weak spokesman. sp The athlete quite sports.

He eats soup with spoons.

He bought stamps. st The frog leaps steadily.

He made food with starch.

I'll book screenwriters. sk As a judge, I trap scapegoats.

In the musical, I choreograph scarecrows.

He has a fantastic smile. sm The dog cannot smell.

He wants cheap smokes.

He caught snakes. sn With this pill, he'll stop snoring. I went with Jeff Sneider.

I have new lilac sleepwear. sl You have to cook it slow.

The cows went to a rough slaughterhouse.

A.2 s#C sentences

The elephant likes peanuts. sp The little girls pets ponies.

The caveman saw five mammoths pass.

The giant gun shoots tanks. st On the weekends he traps turtles. He'll take two oaths tomorrow.

The presenter lacks confidence. sk He claps cordially. That's Joseph's car.

He said he's ok, but he looks mad. sm He mistreats Mary. He drops money.

We found a whale because of the spout. They didn't let Mary speak. We saw Mario spying.

We saw a fly and Alyssa stomped it. This weekend we'll go to Stanford. I heard Jesse steals.

We made Sarah skewers. Joe picked at the scab. Anna donated three scooters.

He misbehaved and received a smack.

The painters don't want to smear it.

For breakfast, I made John and Vicky smoothies.

The rabbit tripped a snare. The police shot two snipers. We heard Rene snarl.

She always plays at the slot machines. He bought a fancy slab. He wears blue slacks.

I'll barbeque boneless pork.

In my kitchen there's a mouse playing.

My neighbor has a malicious pitbull.

Dorothy helped the heartless Tin-man.

The new helper has to bus tables.

The day after Dracula bit her, the princess turned.

It is illegal to harass colleagues. She found a worthless coin. I gave my son a Swiss car.

The postcard was for Thomas Moore. He proved to be a careless man. They stole the brass mold.

She autographs napkins. sn The clock beeps normally.

In the composition he repeats nouns.

He makes lemonade. sl He babysits Lori.

The comedian laughs loudly.

I fell and the class noticed it.

They ended up being a landless nation.

I'll floss nervously.

She caught her spouse lying. The model has luscious lips. Nobody talks to the homeless loner.

A.3 Filler sentences

The sailor broke his compass.

The newlyweds bought a new mattress.

The boy gave her a kiss.

In the city there are many pedi-cabs.

Susan has three daughters.

The professor has written many books.

I'll quit because I don't like my new boss.

The bottle was full.

The old lady has five cats.

The doctor gave her an injection.

A.4 sC words

sp st sk sm sn sl

spicy start scar small snack slacker

spear steam schedule smith sniff sled

spell stay skin smelt sneeze sleeves

spoiled stone scavenger smolder snooze slug

spooky stool skull smooth snoop sling

A.5 Filler words

average brown proud feeling groom broom

accent operation presumption phantom horse price

automatic ordinary edit foam house treasure

criminal object enjoy under heel green

crowd drawing event useless imagination

culture drag trial ugly ice

brake dress trash graph interest