Scholarly article on topic 'Lexical competition in vowel articulation revisited: Vowel dispersion in the Easy/Hard database'

Lexical competition in vowel articulation revisited: Vowel dispersion in the Easy/Hard database Academic research paper on "Languages and literature"

CC BY-NC-ND
0
0
Share paper
Academic journal
Journal of Phonetics
OECD Field of science
Keywords
{"Vowel dispersion" / "Vowel formants" / "Phonological neighbourhood density" / "Language production" / Coarticulation / Pronunciation / Intelligibility}

Abstract of research paper on Languages and literature, author of scientific article — Susanne Gahl

Abstract A widely-cited study investigating effects of recognition difficulty on the phonetic realization of words (Wright, 2004). Factors of lexical competition in vowel articulation. In J. Local, R. Ogden & R. Temple (Eds.), Papers in laboratory phonology, Vol. VI (pp. 26–50)) reported that vowel dispersion, i.e. distance from the center of the talker's F1/F2 space, was greater in words that represented difficult recognition targets (‘hard’ words) than in easy recognition targets (‘easy’ words). The goal of the current study was to test whether that effect persisted when controlling for known other determinants of F1 and F2. A second goal was to test whether the pattern observed in the recordings analysed in Wright (2004) extended to all monophthongs in the set of recordings of which the words analysed in Wright (2004) formed a subset. We find that the dispersion difference between ‘easy’ and ‘hard’ words vowel dispersion is expected, given previous observations about effects of phonetic environment on vowel formants. When segmental context is taken into account, recognition difficulty fails to be predictive of vowel dispersion, both in the subset and in the larger database. An analysis of the fitted values of models of F1 and F2 based on consonantal factors (but not recognition difficulty) shows that the formant values predicted by those models separate vowels in “easy” and “hard” words in the manner observed in W2004. We discuss the implications for the effect of phonological neighbourhood density on language production, and for the relationship between lexical retrieval, auditory recognition difficulty and pronunciation variation.

Academic research paper on topic "Lexical competition in vowel articulation revisited: Vowel dispersion in the Easy/Hard database"

a ■

Journal of mO

Phonetics

Contents lists available at ScienceDirect

Journal of Phonetics

journal homepage: www.elsevier.com/locate/phonetics

Research Article

Lexical competition in vowel articulation revisited: Vowel dispersion in the Easy/Hard database

Susanne Gahl *

Department of Linguistics, 1203 Dwinelle Hall, University of California at Berkeley, Berkeley, CA 94720-2650, USA

CrossMark

ARTICLE INFO

ABSTRACT

Article history: Received 11 July 2013 Received in revised form 6 December 2014 Accepted 11 December 2014 Available online 14 January 2015

Keywords: Vowel dispersion Vowel formants

Phonological neighbourhood density

Language production

Coarticulation

Pronunciation

Intelligibility

A widely-cited study investigating effects of recognition difficulty on the phonetic realization of words (Wright, 2004). Factors of lexical competition in vowel articulation. In J. Local, R. Ogden & R. Temple (Eds.), Papers in laboratory phonology, Vol. VI (pp. 26-50)) reported that vowel dispersion, i.e. distance from the center of the talker's F1/F2 space, was greater in words that represented difficult recognition targets ('hard' words) than in easy recognition targets ('easy' words). The goal of the current study was to test whether that effect persisted when controlling for known other determinants of F1 and F2. A second goal was to test whether the pattern observed in the recordings analysed in Wright (2004) extended to all monophthongs in the set of recordings of which the words analysed in Wright (2004) formed a subset. We find that the dispersion difference between 'easy' and 'hard' words vowel dispersion is expected, given previous observations about effects of phonetic environment on vowel formants. When segmental context is taken into account, recognition difficulty fails to be predictive of vowel dispersion, both in the subset and in the larger database. An analysis of the fitted values of models of F1 and F2 based on consonantal factors (but not recognition difficulty) shows that the formant values predicted by those models separate vowels in "easy" and "hard" words in the manner observed in W2004. We discuss the implications for the effect of phonological neighbourhood density on language production, and for the relationship between lexical retrieval, auditory recognition difficulty and pronunciation variation.

© 2015 Published by Elsevier Ltd.

1. Introduction

The sentence "We speak in order to be understood" appears in works ranging from Phonetics and Psycholinguistics (Hawkins, 2003, p. 376; Pluymaekers, Ernestus, & Baayen, 2005a, p. 146), to the study of the composition and delivery of sermons (Vinet, 1870, p. 369). In Phonetics, its usual role is to highlight the potential of intelligibility as a driving force in pronunciation variation: If we speak in order to be understood, then variation in fine phonetic detail may be due in part to our desire to make our speech as intelligible as possible. A seminal study exploring the relationship of recognition difficulty and pronunciation is Wright (1997, 2004); henceforth W1997 and W2004. Its empirical domain is vowel dispersion, i.e. the distance between vowel tokens and the center of a talker's vowel space as defined by the first two formants. Increased dispersion and vowel space expansion are associated with increased intelligibility (Bradlow, Torretta, & Pisoni, 1996; Ferguson, 2007; Krause & Braida, 2004; Neel, 2008). That observation, and the fact that speakers are able to control articulation so as to increase or decrease dispersion, makes vowel dispersion an attractive means for examining the role of recognition difficulty in pronunciation variation. W2004 examined two groups of CVC words, classified as 'easy' and 'hard' targets for recognition, based on prior research (Luce & Pisoni, 1987; Luce, Pisoni, & Goldinger, 1990; Pisoni, Nusbaum, Luce, & Slowiaczek, 1985). W2004 found vowel dispersion to be greater in 'hard' words than in 'easy' words. The interpretation of that finding offered in W2004 is that "talkers adjust the degree of hyperarticulation to compensate for factors that may impede the intelligibility of a message" (W2004, p. 84). W2004's finding has been widely cited, replicated, and extended to other aspects of pronunciation besides vowel dispersion, such as voice onset time and coarticulatory vowel nasalization (Baese-Berk & Goldrick, 2009; Goldrick, Vaughn, & Murphy, 2013; Harrington, 2010; Kilanski, 2009; Munson, 2007; Munson & Solomon, 2004; Pierrehumbert, 2002; Scarborough, 2010).

* Tel.: +1 510 642 2757; fax: +1 510 643 5688. E-mail address: gahl@berkeley.edu

0095-4470/$-see front matter © 2015 Published by Elsevier Ltd. http://dx.doi.org/10.1016/j.wocn.2014.12.002

However, the dictum that "we speak in order to be understood" must be paired with the similarly appealing observation that "speaking is one of man's most complex skills" (Levelt, 1989, p. 1). Part of that complexity is due to the need to plan, coordinate, and execute the movements of the articulators; part of it is due to the access and retrieval of words from the mental lexicon during utterance planning and production; and part of it is due to the demands imposed by interactions with our interlocutors. All of these sources of complexity - articulation, utterance planning, lexical retrieval, and situational demands - are also documented sources of pronunciation variation (Arnold, Tanenhaus, Altmann, & Fagnano, 2004; Balota, Boland, & Shields, 1989; Bard et al., 2000; Bell et al., 2003; Gahl, 2008; Stevens & House, 1963). The question is how these factors combine.

In many cases, different sources of variation may produce similar effects. For example, high-frequency words are easy targets for auditory recognition (Howes, 1957), as well as for retrieval (Jescheniak & Levelt, 1994; Levelt, Roelofs, & Meyer, 1999; Oldfield & Wingfield, 1965) and articulation (Balota & Chumbley, 1985) in spoken word production. Accordingly, the phonetic reduction of high-frequency words has been linked to semantic and phonological retrieval speed, articulatory routinization, and speakers' minimizing effort without loss of intelligibility, among other explanations (Aylett & Turk, 2004; Bell, Brenier, Gregory, Girand, & Jurafsky, 2009; Gahl, 2008; Lindblom, 1990; Moon & Lindblom, 1994; Pierrehumbert, 2001). In other cases, considering different sources of variation may lead to diverging predictions. One of the lexical variables determining word recognition difficulty in W2004 is a case in point. One property of the 'hard' words in the Easy/Hard database is high phonological neighbourhood density. Phonological neighbourhood density is usually defined as the number of words that differ from a target by substitution, deletion, or addition of one phoneme, often weighted by the relative frequency of target and neighbours (Luce et al., 1990). It is well established that high phonological neighbourhood density inhibits auditory word recognition (Luce & Pisoni, 1998; Pisoni et al., 1985; Vitevitch & Luce, 1998). In spoken word production, on the other hand, there is evidence suggesting that high phonological neighbourhood density facilitates lexical retrieval (Dell & Gordon, 2003; Gordon, 2014; Vitevitch, 2002; Vitevitch & Sommers, 2003). As Dell and Gordon (2003) put it, phonological neighbours in the lexicon are "foes" in auditory word recognition, and "friends" in spoken word production. This asymmetry between auditory word recognition and spoken word production has been shown to be consistent with predictions of a spreading-activation model of lexical access and retrieval (Dell & Gordon, 2003), and has received a general explanation in models implementing joint activation of multiple items in networks (Chen & Mirman, 2012).

The effects of phonological neighbourhood density on auditory word recognition and spoken word production provide a means of gauging the respective role of recognition difficulty and lexical retrieval in pronunciation variation, as argued in Gahl, Yao, and Johnson (2012): To the extent that pronunciation variation reflects talkers' desire to be understood, one might expect talkers to hyperarticulate words in dense phonological neighbourhoods, so as to compensate for the inhibitory effect of high phonological neighbourhood density. That is the pattern observed in W2004, and the interpretation given to it: High phonological neighbourhood density was one of the two lexical properties (along with low frequency) characterizing the 'hard' words in W2004, which were found to be produced with increased vowel dispersion compared to the 'easy' words. The pattern of increased dispersion in 'hard' words is consistent with Lindblom's theory of Hyper- and Hypo-articulation, which holds that speakers vary articulatory effort according to the informational requirements of their listener (Lindblom, 1990). On the other hand, to the extent that pronunciation variation reflects the speed or ease of lexical retrieval and encoding, one might expect words in dense phonological neighbourhoods to undergo phonetic reduction, much in the same way as words of high lexical frequency. That is the pattern observed in a study of conversational speech (Gahl et al., 2012, examining the Buckeye corpus of conversational speech). Gahl et al. (2012) found that, other things being equal, words with many neighbours were shorter in duration and had decreased, not increased, vowel dispersion. Gahl et al. attributed that finding to the facilitatory effect of high neighbourhood density on lexical retrieval. Gahl et al. (2012) speculated that the pattern of seemingly contradictory effects (increased dispersion in single-word production as reported in W2004 vs. decreased dispersion in conversational speech) might reflect the different temporal and attentional demands of single-word naming as compared to conversational speech.

In the current study, we pursue a different possible explanation for the pattern in vowel dispersion observed in W2004: consonantal context. As we discuss in detail below, considerations for effects of consonantal context figured in the item selection in W2004, but not in the quantitative analysis of the data. This raises the possibility that the pattern of increased dispersion in the "hard" words compared to the "easy" words might have arisen due to coarticulation from the consonants flanking the vowels. In other words, the observed pattern of vowel dispersion need not reflect either recognition difficulty or facilitation of lexical access in dense phonological neighbourhoods.

1.1. Central findings in Wright (2004)

Consider Fig. 1, which is a reproduction of Fig. 4.2 in W2004. The plot shows the location of vowels in Easy and Hard words, based on the average F1 and F2 values for each vowel type in the 'easy' and 'hard' words. The prediction W2004 set out to test was that "'easy' words should show a greater degree of reduction than 'hard' words (or conversely that 'hard' words should show a greater degree of hyperarticulation.)" (Wright, 2004, p. 79). Reduction and hyperarticulation were operationalized as vowel dispersion, on the basis of previous studies establishing that increased F1/F2 spaces were associated with increased intelligibility. Vowel dispersion was defined and measured following the criteria and methods in Bradlow et al. (1996). It was found that vowels in 'hard' words had greater dispersion values, on average, than vowels in 'easy' words. That finding was supported by an ANOVA taking into account Vowel, Difficulty, and Talker. W2004 concludes that "There is a clear difference in the degree of dispersion with the vowels from 'hard' words being more dispersed on average than the vowels from 'easy' words. This difference in dispersion represents an overall expansion of the vowel space for 'hard' words."

15 14 13 12 11 10 9 8

i+ i e+ I+1 e u+ u

X 8+ a+ a a o

Fig. 1. Fig. 4.2 in W2004, showing by-vowel mean F1 and F2 values. Vowel symbols followed by ' +' Indicate means for 'hard' words, plain symbols Indicate vowels for 'easy' words. Reproduced with permission of Cambridge University Press.

axaiauee ! ¡doau

Fig. 2. Fig. 4.3 in W2004, showing by-vowel mean dispersion values in 'easy' ("e") and 'hard' ("h") words. Reproduced with permission of Cambridge University Press.

Table 1

"Easy" and "hard" words analysed in W2004.

Vowel 'easy' 'hard'

a job watch shop wad knob cod

^ gas jack path pat hack hash

ai five wife vice rhyme white lice

au mouth rout

e gave faith shape page chain fade dame mace sane wade

£ death check leg den wed pet

i peace deep teeth bead tea(t) weed

i give thing ship thick kit hick kin mitt

o wash caught/cot

o both vote goat moat

u food hoot

a hung judge love rough young hum pup mum bum bug

The by-vowel comparisons of dispersion in easy vs. hard words are shown in Fig. 2, which is a reproduction of Fig. 4.3 in W2004. W2004 summarizes the pattern of results as follows: "[T]he vowels /i, a, o, u/ (referred to as 'point vowels' henceforth) show the greatest difference between 'easy' and 'hard' words whereas the remainder of the vowels are only slightly expanded or not expanded at all." (W2004:82).

Vowel dispersion was measured at the point of maximal displacement, i.e. "when F1 and F2 are the most characteristic for that particular vowel" (W2004:80). Recognizing that neighbouring consonants can affect vowel formants, W2004 excluded the initial and final 50 ms from the window of analysis in which the point of maximal displacement was to be located. W2004 further excluded words with postvocalic /l/ and /r/. In addition, W2004 states that "nasal codas were balanced in both sets".

Table 1 shows the list of items in W2004. It will be observed that only four of the 'easy' words (chain, thing, hung, young) and eight of the 'hard' words (rhyme, dame, sane, den, kin, hum, mum, bum) contain nasal codas. Only one of the 'easy' words (mouth), and five of the hard words (knob, mace, mitt, moat, mum) have initial nasals. Nasal consonants are thus not balanced across the easy vs. hard sets of words. None of the point vowels, i.e. the vowels that showed the greatest difference between the 'easy' and 'hard' condition, occur before nasal codas in the word list. The weak or absent difference between the 'easy' and 'hard' condition for the remaining vowels could conceivably be due to the occurrence of nasal codas after /ai, e, £, i, a/. The uneven distribution of nasals in the word list may either have obscured a real effect of recognition difficulty or contributed to an apparent effect.

Table 1 further shows that several other consonantal features, such as place of articulation and voicing, are also not matched across pairs of words with the same vowel and were unevenly distributed across the two sets of words. For example, only 11 of the

'easy' words, and 23 of the 'hard' words, end in oral stops; only 3 of the 'hard' words, and 16 of the 'easy' words, end in fricatives. If manner of articulation (stop vs. fricative) or nasality of final consonants affects the location of vowels in F1/F2 space, then the uneven distribution of consonant types across the two classes of words may potentially bring about an apparent effect of recognition difficulty. More generally, the incomplete crossing of consonant-vowel combinations with recognition difficulty means that the by-vowel mean dispersion of 'easy' and 'hard' words may conceivably be due to differences other than the easy/hard distinction.

Naturally, a perfect match of phonological characteristics across the two sets is impossible, given the requirement that the two sets must differ in phonological neighbourhood density. The question is whether the differences in phonetic contexts could result in higher dispersion in 'hard' vs. 'easy' words. To understand whether phonetic context effects may have been the source of the observed differences in vowel dispersion, we turn to previous studies of phonetic context effects. If the size and direction of effects observed in the Easy/Hard database parallel those in other studies, then the role of recognition difficulty in the Easy/Hard database deserves closer scrutiny.

1.2. Consonantal effects on the first two formants

It is well established that vowel formants are affected by consonant context, not just in the immediate vicinity of a consonant, but also at the vowel's temporal midpoint (e.g. Hillenbrand, Clark, & Nearey, 2001; Stack, Strange, Jenkins, Clarke III, & Trent, 2006; Stevens & House, 1963; Strange, Weber, Levy, Shafiro, & Hisagi, 2007). Most of the literature on the effect of consonantal context on vowel formants has considered F1 and F2, rather than vowel dispersion, i.e. Euclidean distance from the center of a talker's F1/F2 space.

Consonantal factors known to affect F1 in American English include voicing, nasalization, and place of articulation. Stevens and House (1963) and Hillenbrand et al. (2001), focusing on "symmetrical" environments, i.e. vowels flanked by identical consonants, observed F1 to be lower between voiced consonants than between voiceless ones. The degree of F1-lowering varied by vowel and was generally stronger in mid and low vowels than in high vowels, e.g. ca. 75 Hz in /a/, ca. 15-20 Hz for other back/central vowels, ca. 90 Hz for /i/, 90-120 Hz for /£/, 70-100 Hz for /«/, and negligible for /i/ in Hillenbrand et al. (2001). Vowel nasalization tends to 'centralize' vowel height, i.e. result in increased F1 (i.e. vowel lowering) of high vowels, and decreased F1 (i.e. raising) of non-high vowels (Beddor, 1983; Chen, 1997; Flemming, 2010a; Fourakis, 1991). Effects of place of articulation on F1 are generally reported to be subtle by comparison to the effects of voicing and nasalization, but there is some evidence for a downward shift in F1 for/£/ and /«/ between alveolars and velars, compared to labials and vowels in 'null' environments (i.e. in the environment /hVd/ or produced in isolation; S&H, Hillenbrand et al., 2001). F1 in mid-to-low vowels (/«, a, o, a/) has also been found to be lower after alveolar stops than after labial and velar stops (Strange et al., 2007).

Consonantal factors known to affect F2 include place and manner of articulation, and, to a lesser extent, voicing. Hillenbrand et al. (2001) found effects of place of articulation on F2 to be strongest in the high back vowel /u/, for which F2 was higher after and between alveolar consonants than consonants at other places of articulation (on average by 214 Hz for men and 281 Hz for women); similar patterns are reported in Stevens and House (1963) and Strange et al. (2007). To a lesser degree, increased F2 in other back and central vowels has also been found before and after alveolar and velar stops (Hillenbrand et al., 2001; Strange et al., 2007). F2 in the front vowels /i, £/ has been found to be lower near labials than other places of articulation (Hillenbrand et al., 2001; Stevens & House, 1963). F2 has also been reported to be affected by voicing, although not consistently or strongly. Stevens and House (1963) found that F2 was higher for front vowels near voiced consonants than near voiceless ones. Back vowels showed little or no effect of voicing. Hillenbrand et al. (2001) found slightly higher F2 values near voiced consonants than near voiceless ones, particularly in back vowels; however, in the statistical analysis, initial voicing accounted for only about 0.1% of the total variability, while final voicing failed to clear the threshold for statistical significance in that study. As for manner of articulation, Stevens and House (1963) compared vowels near oral stops vs. fricatives and found that size and direction of the effect varied with vowel type: In front vowels, especially /i/ and /£/, F2 tended to be higher near stops than near fricatives; for the high back vowels /u/ and /u/, F2 was higher near fricatives than near stops. Hawkins and Slater (1994), focusing on the vowels /a, u, i, a/ preceded by either /r/ or /z/ and followed by voiced stops found F2 to be significantly lower following /r/ than following /z/. This pattern was strongest in /u/.

Place of articulation of adjoining consonants can affect vowel formants both when the consonants precede and when they follow the vowels, though there is some evidence suggesting that effects of place of articulation of preceding consonants may be stronger than effects of consonants following the vowels (Hillenbrand et al., 2001), at least when CVC syllables are produced one at a time, as opposed to in running speech. For other features, such as voicing (Hillenbrand et al., 2001) and nasality (Scarborough, 2013) carryover effects may be as strong as anticipatory ones.

Consonantal predictors of F1 and F2 values do not tend to have across-the-board effects on vowel dispersion, e.g. uniformly increasing or decreasing distance from the center of F1/F2 space. Instead, consonant-conditioned changes in F1 or F2 and the resulting effects on dispersion vary by vowel type. For example, nasality may result in low vowels getting raised (F1 lowering) and high vowels lowering (F1 increasing). Changes in vowel dispersion as a function of consonantal context or of other predictors of F1 and F2 are rarely explicitly modelled. An exception is Aylett and Turk (2006), which spells out the predicted relationship between changes in F1 and F2 and dispersion for each of the vowels /a, £, i, u/.

Other factors do affect overall vowel space expansion, including vowel duration, speaking rate, and talker sex. Other things being equal, vowel spaces tend to be more compact, i.e. less dispersed, at faster speaking rates (Fourakis, 1991; Lindblom, 1983; Moon & Lindblom, 1994; Stack et al., 2006), and in the speech of male vs. female talkers (Byrd, 1994). High vowel dispersion, and high vowel space expansion as quantified by a variety of criteria has also been found to be one of the acoustic characteristic features of 'clear

speech' (Bradlow et al., 1996; Ferguson, 2007). The connection between clear speech and vowel dispersion motivated the claim in W2004.

1.3. The current study

Given the consonantal effects on vowel formants, it is conceivable that the data examined in W2004 reflected not just whatever effect recognition difficulty may have, but also consonant context. Since consonant context was not taken into account in the statistical analysis in W2004, and since context, vowel type, and recognition difficulty were not fully crossed, it is possible that the overall pattern of higher dispersion in 'hard' words than in 'easy' words was due to the combined effects of segmental context and the distribution of vowels across consonant types.

A way to escape the constraints imposed by a particular word list is to extend the investigation to larger sets of words. As it happens, W2004 was based on a subset of a larger dataset, known as the Easy/Hard database (Torretta, 1995). The analyses in W2004 and the larger set in the Easy/Hard database thus afford the opportunity to analyse additional data with W2004's observations as a baseline. The present study consists of a replication and extension of the analyses in W2004, based on the same set of recordings analysed in W2004. One goal of the current study was to investigate whether the pattern of greater vowel dispersion in 'hard' vs. 'easy' words was to be expected, given the consonants in the target words, and whether the effect of recognition difficulty persisted when segmental context was taken into account. We begin by fitting regression models of F1 and F2. We then use the predictions of those models to predict the location in F1/F2 of 'easy' and 'hard' words. A related goal was to examine whether the observed pattern of dispersion in 'easy' vs. 'hard' words extended to other parts of the Easy/Hard database.

2. Materials and methods

2.1. Recordings and word lists

All recordings came from the Easy/Hard database (Torretta, 1995). The full database consists of 4500 audio files, representing 150 word types, read by ten talkers at three speaking rates. The recordings were made at the Speech Research Laboratory at Indiana University. No information about the talkers' linguistic background is given in Torretta (1995). W1997 states that the talkers represented a variety of dialects, all characterized as "General American English", and asserts that "all the dialects had the same vowel-quality categories in all of the stimuli". Details of the recording procedure are described in Torretta (1995).

The word lists were constructed on the basis of previous research on word recognition, specifically of the effects of lexical familiarity, lexical frequency, and phonological neighbourhood structure on recognition difficulty (Luce & Pisoni, 1987; Luce et al., 1990; Pisoni et al., 1985). Phonological neighbourhood structure is captured by two related variables: (a) phonological neighbourhood density and (b) neighbourhood frequency. Phonological neighbours are words in the lexicon that differ from a target by addition, deletion, or substitution of one phoneme. For example, the neighbours of pat include the words cat, pot, spat, and pan. Phonological neighbourhood density refers to the number of neighbours of a target. Neighbourhood frequency was defined as the mean word frequency of a target's neighbours. The 150 word types consisted of two sets of 75 words, termed 'easy' and 'hard', on the basis of recognition difficulty. "Easy" words, i.e. easy targets for recognition, were high-frequency words facing little competition from their neighbours, i.e. with low neighbourhood density and low neighbourhood frequency, relative to the target frequency. "Hard" words, i.e. difficult recognition targets, were low-frequency words with many neighbours and high neighbourhood frequency. Lexical familiarity was held constant across the two groups: both groups had very high familiarity ratings (greater than 6.7 on a seven-point scale). Estimates of familiarity, frequency, and phonological neighbourhood structure were based on the Hoosier Mental Lexicon (Nusbaum, Pisoni, & Davis, 1984).

The subset of the Easy/Hard database analysed in W1997 and W2004 consisted of 68 words (34 'easy' ones and 34 'hard' ones) from the original set of 150, spoken by the ten talkers at the 'medium' rate, for a total of 680 tokens out of the 4500-token database. As in the full database, the 'easy' words had few neighbours and were of high lexical frequency relative to their neighbours, whereas the 'hard' words had many neighbours and were of low lexical frequency relative to their neighbours.

In our reanalysis of the audio files, one file was found to be corrupt and had to be excluded from the analysis. Another file had to be excluded because the talker produced the word mail instead of the target mall. There were a small number of discrepancies between the audio files and the description of the data base in Torretta (1995). The first discrepancy was that three tokens of the word "job" were coded as containing the vowel [o], despite the fact that none of the talkers pronounced job with that vowel (as might be the case if they said the biblical name 'Job'). These three tokens were re-coded as containing the same vowel as the words shop, watch, cod, knob, and wad. Secondly, the item bag appears in the stimulus lists in Torretta (1995) and Wright (1997, 2004), but the corresponding recordings appeared to be the word hag for all talkers, with a period of audible frication before the vowel. It is no longer possible to recover whether the discrepancy is due to an error in stimulus description, stimulus presentation, participant error, or some other factor. The word wrong appeared on both the 'easy' list and the 'hard' list. For this reason, it was excluded from all current analyses (the item was also excluded in W2004).

The word lists in W1997 vs. W2004 differ slightly from one another. W2004 lists the item tea, but talkers said "teat", consistent with the wordlist and the description of the database as consisting of CVC words in Torretta et al. Secondly, W2004 lists the orthographic form cot, but not caught, whereas W1997 lists caught, but not cot. The item 'caught' is in fact the only 'hard' word transcribed with the symbol 'o' in W2004, paired with the sole 'easy' word with that vowel, wash. Torretta (1995) and Bradlow and Pisoni (1999), both of

Table 2

Mean (range) of lexical frequency and neighbourhood density of 'easy' and 'hard' words in Torretta (1995), Wright (1997) and the current study.

Lexical frequency Neighbourhood density

Easy Hard Easy Hard

Torretta (1995) 384.84 10.73 14.47 (1-31) 27.75 (8-45)

n—150 word types (0.59-5654.73) (0.31-171.45)

Wright (1997) 218.25 12.05 14.0 (4-28) 26.91 (8-43)

n—68 word types (13.98-1167.82) (0.31-171.45)

Current study 434.31 8.88 14.95 (1-31) 28.25 (16-45)

n—125 word types (0.59-5654.73) (0.31-42.73)

which are based on the Easy/Hard database, list the item in question as cot. In the current study, the spelling cot (and the corresponding lexical frequency) is assumed, because Torretta (1995) was closer in time to the data collection phase, and because the item in question appears on the list of "hard" words: Since caught has a fairly high lexical frequency, it is unclear whether it would have met the inclusion criteria for the "hard" set.

The treatment of the cot/caught item raises the more general issue of the pronunciation and distinctness of the low/back vowels (Clopper, Pisoni, & de Jong, 2005; Clopper & Tamati, 2014; Labov, Ash, & Boberg, 2005; Thomas, 2001). We inspected plots of each talker's F1 and F2 in words transcribed with /0/ and /a/, to find out whether talkers made reliable and similar distinctions between these two vowel types. There was a great deal of overlap in the realization of these vowels, both within talkers and within items. We therefore fitted each model twice, once with /0/ and /a/ as separate types, and once with a "merged" low/back vowel type. The pattern of significant fixed effects was unchanged, regardless of whether /0/ and /a/ were treated as one vowel or two. To facilitate comparison to Wright, 2004, the models reported here treated /0/ and /a/ as separate vowel types.

Another vowel for which the assumption of dialect homogeneity may have been an oversimplification is the vowel /«/, which for many speakers is split into two or even three clusters that differ in vowel length, height, and frontness (Clopper et al., 2005; Labov et al., 2005; Thomas, 2001). Inspection of the F1/F2 plots for that vowel in each of the talkers suggested that this vowel may actually be appropriately modelled as two, or even three, separate vowel targets. To facilitate comparison to W2004, we treated /«/ as a single vowel.

There were 8 items with the diphthongs /ai/ and /au/ in the W1997/2004 subset. Words containing these diphthongs were excluded from the present analysis. In part, this decision was due to the difficulties in getting reliable formant measurements, and in part it was due to the absence of these diphthongs from previous studies of phonetic context effects. Excluding these words left 60 word types (600 tokens) in the set for the replication of W1997, and 125 types (3734 tokens) for the analysis of the larger database.

2.2. Formant measurements

A research assistant with training in phonetic analysis measured the first two vowel formants for the set of 600 tokens, following the methods described in W2004, i.e. measuring the formants at the point of maximal displacement, after excluding the initial and final 50 ms of the vowel. The research assistant was unaware of the origin of the sound files and of the aims of the analysis. Cases where the research assistant indicated "having trouble" deciding on the point of maximal displacement or identifying formant peaks were marked as "problematic". All statistical analyses were carried out with and without the problematic cases. A total of 63 observations were marked as "problematic". Almost half of these (28 tokens) contained the vowel [e], presumably reflecting the diphthongal nature of this vowel. The vowels [i], [u], and [0] yielded one problematic token each and [i] two. The other problematic cases were approximately evenly distributed across vowel types, with 5-7 problematic cases in each set. All statistical analyses were carried out after removing the problematic observations and then repeated for the whole dataset, with very similar results.

To facilitate the analysis of a larger data set, we used automatic alignment and formant extraction. To that end, the audio files for the Torretta Easy/Hard database were aligned with the broad transcriptions of the words at the phone level using the Penn Phonetics Lab Forced Aligner Toolkit (Yuan & Liberman, 2008). The start and end times of each vowel phone were obtained from the alignment results, and a portion of each token's audio file was extracted, starting 40 ms before the start time and ending 40 ms after the end time of the vowel. This audio was downsampled to 12 kHz and analysed by the Watanabe and Ueda formant tracker (Ueda, Hamakawa, Sakata, Hario, & Watanabe, 2007; Watanabe, 2001). Measurements for F1-F4 and F0 were recorded for the analysis frame occurring at the temporal midpoint of the vowel. In four cases, the automatic tracking resulted in F1 values of zero Hz or below 2 on the Bark scale. In seven cases, formant tracking errors resulted in missing F1 or F2 values. These tokens were excluded from further analysis.

We first compared by-hand and automatic measurements, and statistical models of vowel dispersion in each set of measurements, for the 600 tokens from the W1997 subset and fitted statistical models to each set of measurements. Since these results produced very similar patterns, we also report the analysis of the automatic measurements for the larger dataset, i.e. 125 word types (3734 tokens) from the Easy/Hard database.

2.3. Item characteristics and descriptive statistics

Table 2 shows the mean values and the ranges of the frequency and neighbourhood density of all word types in the Easy/Hard database, and of the subsets analysed in W1997 and in the current study. For the most part, the mean of the lexical frequency and neighbourhood density of easy vs. hard words are similar across samples. An exception is the lexical frequency of 'hard' words, which is lower in the sample for the current study than either W1997 or Torretta (1995). Since low lexical frequency is one of the characteristics of the 'hard' set, the lower frequency of the 'hard' words in the current study should, if anything, aid in replicating any effect of recognition difficulty.

Vowel dispersion was calculated as the Euclidean distance between the point defined by the F1 and F2 (Bark) values of each vowel token and the talker's average F1 and F2 (Bark) values, following the method proposed in Bradlow et al. (1996). To trace the method used in W2004 as closely as possible, talker-specific vowel centres were calculated using only the words that also entered into Wright's analysis for the purposes of the replication of the analysis of the W2004 subset. For the analysis of the larger set, talker-specific vowel centres were calculated using all analysable monophthongs. Table 3 shows the token counts for the 'easy' and 'hard' conditions, and the means and standard deviations of F1, F2, and dispersion for each vowel.

2.4. Statistical treatment of the data

For the statistical analysis, we fitted linear mixed-effects regression models. The general aim and structure of mixed-effects regression models, i.e. models containing random effects along with fixed effects, can be summarized as follows: For fixed effects, model estimates describing the relationship between each level of a categorical predictor and the outcome variable can vary freely: The models treat the estimates for each factor level as a separate parameter, without imposing any constraints on how the levels might differ from one another. Random effects, by contrast, are predictors whose values are treated as random samples from a larger population. Random intercepts model differences in the "baseline" dispersion, for example across vowels (e.g. /i/ is typically more distant from the center of F1/F2 space than /a/ is), or across talkers (some talkers have larger vowel spaces than others). Along with random intercepts, mixed-effects models may include random slopes. The assumption is that, for example, talker-specific "baseline" values of vowel dispersion (modelled as the random intercepts), as well as talker-specific variation in the size and direction of the fixed effects (modelled as random slopes) represent normally distributed random variables with means equal to the population mean for each (intercept and slope) and unknown variance, estimated by the model.

A contentious set of issues in regression modelling concerns the sequence and the criteria by which predictors are added to or eliminated from a model. One issue that has received a great deal of attention concerns the principles driving the specification of the random effects structure (Barr, Levy, Scheepers, & Tily, 2013; Gelman & Hill, 2006; Harrell, 2001). Unless noted otherwise, e.g. in follow-up analyses probing the behaviour of specific variables of interest, we used backward elimination of fixed effects in a model with random intercepts only, followed by forward entry of (by-vowel and/or by-talker) random slopes for the fixed effects that were retained in the initial backward elimination. Since the presence of random slopes can render fixed effects non-significant, the model resulting from forward entry of random slopes was then subjected once more to backward elimination of fixed effects. The criterion for model selection was the AIC (Akaike Information Criterion, Akaike, 1974), a measure comparing models based on goodness-of-fit and model complexity.

Table 3

Descriptive statistics of F1, F2, and dispersion of vowels in the W2004 dataset.

Vowel Difficulty Token count Male Female

F1 mean (sd) F2 mean (sd) Dispersion mean (sd) F1 mean (sd) F2 mean (sd) Dispersion mean (sd)

i easy 30 3.5 (0.28) 13.78 (0.4) 2.96 (0.29) 4.12 (0.91) 15.05 (0.92) 3.17 (0.89)

hard 30 3.46 (0.57) 13.93 (0.41) 3.13 (0.37) 3.66 (0.25) 15.21 (0.47) 3.51 (0.34)

i easy 39 4.8 (0.57) 12.62 (0.53) 1.37 (0.53) 5.15 (0.7) 13.84 (0.68) 1.56 (0.82)

hard 39 4.79 (0.59) 12.77 (0.48) 1.51 (0.44) 5.25 (0.74) 13.87 (0.52) 1.53 (0.65)

e easy 40 4.79 (0.59) 12.91 (0.57) 1.66 (0.54) 5.2 (0.87) 13.95 (1.08) 1.79 (0.69)

hard 31 4.72 (0.53) 12.91 (0.61) 1.71 (0.58) 5.63 (0.86) 14.25 (0.71) 1.76 (0.55)

£ easy 25 5.75 (0.56) 11.94 (0.45) 0.85 (0.23) 6.51 (0.75) 12.88 (1.02) 0.94 (0.65)

hard 28 5.52 (0.69) 11.37 (2.86) 1.61 (2.33) 6.64 (0.75) 13.27 (0.33) 1.01 (0.31)

^ easy 27 5.88 (1.08) 11.97 (1.47) 1.56 (0.99) 7.6 (0.88) 13.2 (0.45) 1.7 (0.53)

hard 28 6.61 (0.47) 12.14 (0.24) 1.55 (0.38) 8.06 (0.68) 13.21 (0.56) 1.99 (0.42)

a easy 48 5.97 (0.74) 9.86 (0.72) 1.82 (0.7) 6.95 (0.84) 11.61 (0.72) 1.56 (0.63)

hard 45 5.86 (0.98) 9.33 (0.43) 2.35 (0.45) 7.26 (0.94) 11.06 (0.88) 2.14 (0.68)

a easy 24 6.44 (0.39) 10.16 (0.8) 1.78 (0.66) 7.94 (0.84) 11.16 (0.63) 2.45 (0.51)

hard 29 6.66 (0.64) 10.09 (0.7) 2.12 (0.47) 7.52 (1.07) 10.34 (1) 3.01 (0.57)

o easy 9 6.22 (0.38) 8.67 (0.5) 2.93 (0.6) 7.58 (0.99) 10.39 (1) 2.84 (0.26)

hard 10 6.97 (0.44) 10.42 (0.46) 1.99 (0.38) 7.68 (0.98) 10.38 (0.97) 2.94 (0.76)

o easy 18 5.31 (0.43) 9.05 (0.36) 2.39 (0.43) 6.52 (0.58) 11.17 (0.39) 1.73 (0.32)

hard 17 5.22 (0.69) 9.52 (0.92) 1.95 (0.97) 6.02 (1.18) 10.81 (1.78) 2.23 (1.94)

u easy 9 4.08 (0.64) 9.88 (1.05) 2.05 (1.01) 4.72 (0.27) 12.75 (0.44) 1.53 (0.29)

hard 10 3.91 (0.46) 9.52 (0.75) 2.39 (0.58) 4.31 (0.42) 12.13 (0.45) 1.98 (0.65)

F2 (Bark)

Fig. 3. Mean F1 and F2 values for vowels in 'easy' and 'hard' words (see text). Dark symbols indicate vowels in "hard" words. Light grey symbols indicate vowels in "easy" words.

• i e i • «□ 6 □ u □ A a D «A

♦ stop a fricative • 9 3

~T 12 F2 (Bark)

~I— 14

12 F2 (Bark)

1:1 alveolar • labial

Fig. 4. By-vowel median F1 and F2 values for vowels preceded by stops and fricatives spoken by male talkers (top panel) and for vowels preceded by alveolars and labials (bottom panel) spoken by female talkers.

All statistical analyses were performed using R (R Development Core Team, 2008) and the R packages languageR (Baayen, 2008), lme4 for mixed-effects modelling (Bates & Maechler, 2010), and LMERConvenienceFunctions for an implementation of the backward-elimination and forward-entry routine (Tremblay & Ransijn, 2013). Normality and homogeneity of the residuals were checked by visual inspection of plots of residuals against fitted values. Observations with large residuals (more than 2.5 SDs) were removed at each modelling step and the model refitted without those cases. Continuous variables (F1, F2, and dispersion) were centred around their means. Treatment coding was used for all factors. Difficulty ('easy' vs. 'hard', based on the classification in Torretta, 1995) was included as a binary factor in models testing that predictor. While it would be desirable to treat recognition difficulty as a continuous variable, doing so would have been problematic here, given that words of medium frequency or neighbourhood density are likely underrepresented in the Easy/Hard database.

3. Results

3.1. Preliminary analyses of the W2004 dataset

To gauge the degree of consistency between the current study and the earlier analyses, we first sought to replicate W1997/2004, restricting our attention to words that were included in that study (excluding the diphthongs [au], [ai], and [oi]), and using the statistical model in W1997/2004 before taking any additional predictors of formant values into account. We excluded cases from the analyses that the research assistant had marked as 'problematic'.

Fig. 3 shows the average F1 and F2 values, pooled across all talkers, for each vowel in 'easy' and 'hard' words, analogous to Fig. 4.2 in Wright (2004) (Fig. 1 above). As in W2004, there appeared to be a tendency for vowels in 'hard' words to be more peripheral than vowels in 'easy' words. The vowel /o/, i.e. the averages based on the items cot/caught ('hard') and wash ('easy'), did not conform to this tendency: its mean F2 value was considerably higher in the 'hard' condition than in the 'easy' condition. Nevertheless, the average dispersion in the W2004 part of the data set, pooled across all talkers and vowel types, was higher for the

Table 4

Fixed effects in a (preliminary) mixed-effects model of vowel dispersion, by-hand measurements.

Beta (SE) t

(Intercept) -0.24 (0.156) -1.543

Difficulty Hard 0.554 (0.114) 4.881

Vowel a 0.433 (0.221) 1.961

^ -0.207 (0.219) -0.947

0 1.225 (0.264) 4.646

£ -0.887 (0.221) -4.02

e 0.033 (0.209) 0.155

I -0.222 (0.21) -1.059

i 1.451 (0.216) 6.724

o 0.402 (0.23) 1.747

u 0.109 (0.264) 0.414

Difficulty x Vowel Hard, a -0.127 (0.189) -0.671

Hard, ^ -0.3 (0.187) -1.6

Hard, 0 -1.007 (0.276) -3.654

Hard, £ -0.363 (0.191) -1.9

Hard, e -0.52 (0.173) -3.002

Hard, I -0.501 (0.168) -2.984

Hard, i -0.377 (0.182) -2.069

Hard, o -0.884 (0.219) -4.028

Hard, u -0.17 (0.276) -0.616

Table 5

Random effects in a (preliminary) mixed-effects model of vowel dispersion, by-hand measurements.

Random effect Variance SD

Vowel (intercept) 0.015 0.123

Talker (intercept) 0.030 0.173

Residual 0.299 0.547

hard words than for easy words (2.08 vs. 1.84 in the by-hand measurements), as in W2004. This tendency was particularly evident in the case of [i], [u], [®], and [a].

Fig. 4 is based on the same data as Fig. 3, this time plotting the by-vowel medians grouped by manner (top panel) and place (bottom panel) of articulation, for male and female talkers, respectively. As the figure illustrates, there are differences based on each of these features that are comparable in magnitude to the differences between the easy vs. hard average F1/F2 values. It should be kept in mind that the material plotted here differs in important ways from the materials in studies such as Hillenbrand et al. (2001) or Stevens and House (1963). In those studies, consonantal features were manipulated one a time. In the words in the W2004 database, that is not the case: For example, words containing initial alveolar consonants differ in voicing and manner of articulation of the initial and final consonants, and in place of articulation of the final consonant.

Statistical analysis retracing the steps in W2004 revealed the same pattern reported in W2004: A repeated-measures ANOVA shows a significant main effect of difficulty, F(1,9) = 10.65, p = 0.001. As in W2004, there were also significant main effects of talker and vowel type, as well as a significant interaction between difficulty and vowel type. The F statistics given in W2004 are "(F(1,480) = 130.92, p<0.0001)" for the main effect of difficulty, and "F(9,480) = 15.22, p<0.0001" for the interaction of difficulty and vowel type. The analysis reported in W1997 and W2004 did not include separate by-talker and by-item F-tests and did not take by-talker error into account, as indicated by the degrees of freedom in the error terms (resulting in a violation of the independence assumption). However, the pattern of significant main effects and interactions was unchanged when that decision was corrected.

Thus, the preliminary analyses yield a replication of the pattern reported in W2004. The question is what role recognition difficulty plays when other predictors of vowel dispersion are controlled for. By way of ensuring that any changes in the estimate of the role of recognition difficulty are not simply due to the change from ANOVA to mixed-effects modelling and the accompanying changes in modelling assumptions, we fit a mixed-effects regression model taking into account only those predictors and interactions that figured in the ANOVA. The model contained difficulty, vowel type, and the interaction between them as fixed effects, along with random by-talker and by-vowel intercepts. We also fitted models with by-talker and by-vowel random slopes for difficulty, allowing the model to capture variation in the size and direction of the effect of difficulty for each talker and each vowel. The correlation parameters of the resulting models indicated that the random intercepts and slopes were perfectly collinear, and that the variance of the random slope was near zero, so we removed the random slopes. The behaviour of the crucial fixed effect ("Difficulty") was unchanged, suggesting that each model recovered the general tendency, across talkers and vowels, for dispersion to be higher in 'hard' words than in 'easy' words. An alternative model with Vowel and the Vowel x Difficulty interaction as fixed effects supported the same overall observation about the effect of Difficulty as the other models: Dispersion was higher in 'hard' words than in 'easy' words when consonantal factors were not taken into account.

Table 6

Fixed effects in a mixed-effects regression model of F1, after backward elimination.

Effect Level Beta (SE) t

(Intercept) -0.485 (0.261) -1.863

Sex Female 0.667 (0.143) 4.672

Height [high] -1.464 (0.374) -3.916

Height [low] 1.425 (0.344) 4.145

Voiceinit [voiced] -0.327 (0.048) -6.800

Nasalfin [nasal] -0.177 (0.064) -2.749

Placefin [alveolar] -0.136 (0.059) -2.313

Placean [velar] -0.088 (0.074) -1.189

Table 7

Random effects in a mixed-effects model of F1, after backward elimination.

Random effect Variance SD

Vowel (intercept) 0.151 0.388

Talker (intercept) 0.079 0.281

Vowel height [+high] 0.081 0.284

Vowel height [+low] 0.277 0.527

Residual 0.234 0.484

The model is summarized in Tables 4 and 5. The estimated coefficient for the fixed effect 'Difficulty' was positive and significantly different from zero (beta = 0.55, t=4.88), indicating that dispersion was higher in 'hard' words than in 'easy' words, other things being equal. Comparisons of models with and without this predictor indicated that including difficulty in the model produced significant model improvement over a model with vowel type as the only fixed effect (x2(1) = 15.82, p<0.0001). The squared correlation of fitted and observed values, which provides a measure of model goodness-of-fit, is .60.

Given that the model of the re-measured data successfully recovered the effect reported in W2004, we next turn to the analysis of the effects of consonantal context. We begin by analysing F1 and F2 in separate models, taking into account known predictors of each.

3.2. Modelling F1 in the W2004 subset

Building on previous literature (Hillenbrand et al., 2001; Stevens & House, 1963; Strange et al., 2007), the following predictors were included as fixed effects: Vowel Height, Voicing and Place of the initial and final consonants, interactions of Vowel Height with Nasality of the initial and final consonants, and Sex. The forward-fitting routine for testing the random effects structure included by-talker random slopes for Vowel height, Nasality and the interaction of Height by Nasality, by-vowel random slopes for Voicing, Nasality, and Sex of the talker.

Nasality and place of articulation of the initial consonant, and voicing of the final consonant did not give rise to any significant main effects or interactions. The AIC was 979 for the initial model, and 837 for the final model. The final model is summarized in Tables 6 and 7. The squared correlation of fitted and observed values of F1, as a measure of goodness-of-fit, was .89.

As one might expect, F1 was higher in low vowels than in mid vowels, lower in high vowels compared to mid vowels, and higher for female talkers than for male ones. The model also recovers several previously observed effects of phonetic context: F1 was lower following voiced initial consonants than following voiceless ones, matching the observations in Stevens and House (1963) and Hillenbrand et al. (2001). There was a significant main effect of nasality of the final consonant, such that F1 was lower before nasal consonants than before oral ones. There was also a significant effect of Place of articulation of the final consonant, with F1 being significantly lower before alveolar consonants compared to the reference level (labials).

The absence of a significant interaction of nasality and vowel height may be surprising at first, since the expected effect of nasality is to lower high vowels and raise low ones (Beddor, 1983; Chen, 1997; Flemming, 2010a; Fourakis, 1991). However, the vowels in these previous studies were /i,u,a/, whereas the only vowels that co-occur with final nasals in the W2004 dataset were /A,£,e,i/. The absence of an interaction of nasality with height may thus be due to the absence of the highest (/i,u/) and lowest (/a/) vowel. The absence of a significant effect of initial nasalization may also be surprising. However, there were only six words with initial nasals (knob, mace, mouth, mitt, moat, mum), so the behaviour of the Nasality variable may be due to data sparseness.

It is conceivable that consonantal factors and Difficulty jointly affect F1, and that a model taking both Difficulty and consonantal predictors into account is superior to one that does not. For F1, i.e. the "height" dimension, an effect of difficulty on vowel peripherality might express itself as an interaction of Vowel height with Difficulty: If Difficulty affects vowel space expansion in the manner outlined in W2004, then speakers might raise high vowels, and lower low vowels in 'hard' words. The AIC changed from 837 to 850 when the interaction was in the model, indicating that the model without the interaction of Vowel height with Difficulty was preferable by that criterion. Neither the interaction of Vowel height x Difficulty, nor the simple effect of Difficulty yielded significant model improvement by a log-likelihood criterion.

In sum, a mixed-effects model of F1 reflects several previously-reported effects of nasality, voicing, and place of articulation. An interaction of Difficulty by Vowel height was non-significant, failing to provide evidence for the notion that vowels in the 'hard' condition tend to be more peripheral along the F1 dimension, compared to vowels in the 'easy' condition.

3.3. Modelling F2 in the W2004 subset

Building on the literature summarized in Section 1.2 above, the following predictors were included as fixed effects in the initial model of F2: Sex of the talker (male vs. female), Vowel frontness (front vs. back) interacting with Place and Manner of articulation of the initial and final consonants (null, i.e. /h/, vs. alveolar vs. labial vs. velar), (stop vs. fricative vs. approximant (/w/ and /r/) vs. lateral (/l/)), and with Voicing of the initial and final consonants. The initial model also included by-vowel and by-talker random intercepts. The forward-fitting routine for testing the random effects structure included by-talker random slopes for Frontness, as well as by-vowel random slopes for Voicing and Manner of articulation of the final consonants, Place of articulation of the initial consonant, and Sex of the talker. Of these, the only random slopes that were retained were the by-vowel slopes for Sex and the by-talker slope for Frontness.

The interaction of Frontness with Manner and Place of articulation of the final consonant and with Voicing of the initial consonant were eliminated during the backward elimination procedure. The simple effect of Voicing of the initial consonant was also eliminated. Place and manner of the initial consonant and Voicing of the final consonant did produce significant interactions with Frontness. The AIC was 1203 for the initial model, and 802 for the final model. The final model is summarized in Tables 8 and 9. The squared correlation of fitted and observed values of F2, as a measure of goodness-of-fit, was .94. The final model is summarized in Tables 8 and 9.

The model estimates reflect the fact that F2 was higher for female talkers than for male ones, and higher in front vowels than in back and central vowels, as one might expect. The model recovers the effect of fricatives vs. stops in front vs. back vowels reported in Stevens and House (1963): F2 in back vowels was higher after initial fricatives than after initial stops, whereas F2 in front vowels, the opposite was the case. The model further recovers the effects of initial /r/ and /l/ on F2 in front vs. back vowels reported in Tunley (1999). The effects of Place of articulation were complicated. For back and central vowels, F2 was estimated to be higher after alveolars and velars: these vowels were estimated to be fronted by 1.03 and 0.96 Bark, respectively, relative to the reference level [h]). In front vowels, F2 was estimated to be lower following alveolars, labials, and velars, compared to /h/, whereas in back

Table 8

Fixed effects in a mixed-effects regression model of F2.

Effect Level Beta (SE) t

(Intercept) -3.224 (0.330) -9.78

Sex Female 1.341 (0.270) 4.97

Frontness [front] 4.050 (0.413) 9.808

Mannerfin [stop] 0.049 (0.062) 0.79

Mannerinit [approx] -0.258 (0.118) -2.193

[fric] 0.139 (0.107) 1.293

[lat] -0.346 (0.181) -1.907

Placeinit [alveolar] 1.031 (0.118) 8.74

[labial] 0.342 (0.131) 2.60

[velar] 0.960 (0.186) 5.15

Places [alveolar] 0.154 (0.065) 2.36

[velar] 0.141 (0.080) 1.78

Voicefin [voiceless] 0.106 (0.091) 1.17

Frontness x Mannerinit [front] [approx] 0.219 (0.174) 1.26

[front] [fric] -0.479 (0.140) -3.44

[front] [lat] 0.027 (0.284) 0.094

Frontness x Placeinit [front] [alv] -1.062 (0.160) -6.64

[front] [lab] -0.645 (0.178) -3.62

[front] [vel] -1.150 (0.239) -4.81

Frontness x Voicefin [front] [voiceless] -0.320 (0.112) -2.84

Table 9

Random effects in a mixed-effects model of F2, after backward elimination.

Random effect Variance SD

Vowel (intercept) 0.233 0.482

Sex [female] 0.314 0.560

Talker (intercept) 0.173 0.416

Frontness: [front] 0.189 0.434

Residual 0.192 0.438

vowels, the opposite was the case. This pattern is similar to that reported in Hillenbrand et al. (2001), except that velar consonants near front vowels in that study patterned more like /h/ and less like alveolars and labials. The model R2 was 0.94.

Analogously to the follow-up analysis of F1, we asked whether the model of F2, i.e. the front/back dimension, could be improved by adding an interaction of Difficulty with Vowel Frontness as a predictor: If vowel space expansion is greater in 'hard' words than 'easy' ones, then talkers might increase F2 in front vowels and decrease it in back vowels when saying 'hard' words. The model did not provide evidence that this was the case: The AIC changed from 802 to 805, and there was no significant model improvement by a log-likelihood criterion (x2 = 0.5315, p=0.77).

In sum, a mixed-effects model of F2 reflected previously-reported effects of talker sex, place and manner of articulation, and voicing. An interaction of Difficulty by Vowel frontness was non-significant, failing to provide any evidence for the notion that vowels in the 'hard' condition tended to be more peripheral along the F2 dimension, compared to vowels in the 'easy' condition.

3.4. Predicting location in F1/F2 space in the W2004 subset

What are the implications of the two models, taken together, for the vowel spaces in the Easy/Hard database? The model parameters provide estimates of the effects of consonant features, such as voicing and nasality other things being equal. But other things are never equal in the word list: The effect of a preceding voiceless stop on a given vowel may be offset or enhanced by some property of the following consonant, for example. One way to get a sense of the expected vowel spaces, given the model estimates, is to examine the model predictions, i.e. the fitted (or 'predicted') values. A model's fitted values, unlike the parameter estimates, reflect all the information available to the model, based on the fixed and random effects. The fitted values for each token of the word path, for example, will take into account the model estimate for the effects of voiceless consonants, the random adjustments for the vowel /«/, and the adjustments associated with each talker. Given the token-level predictions, one can analyse the predicted vowel spaces. We carried out two such analyses.

Fig. 3 above showed the by-vowel averages for F1/F2, averaged by vowel type and Difficulty. Fig. 5 below is analogous to Fig. 3, except that the F1 and F2 values represent by-vowel averages of (uncentered) fitted values for each level of voicing and nasality of the initial consonants. The left panel shows the by-vowel averages for vowels after initial voiced vs. voiceless consonants. The right panel shows the by-vowel averages after initial nasal vs. oral consonants. As one might expect, some consonantal properties that gave rise to significant effects in the models are evident in the predicted values. For example, the left panel of Fig. 5 reflects the fact that vowels tend to be lower (have higher F1 values) after initial voiceless stops than after initial voiced stops, consistent with the parameter estimates of the model of F1. Other patterns emerge in the fitted values without being included among the model parameters. For example, the right hand panel in Fig. 5 shows that the average F2 values of the back vowels /o,a/ were estimated to be lower after nasals than after oral initial consonants. The presence of this pattern does not necessarily reflect an actual effect of nasality of the initial consonant on the vowels /o/ and /a/, nor does it reflect a model estimate of such an effect: The model of F2 did not contain nasality of the initial consonant as a predictor. The difference in the fitted values of these vowels after oral vs. nasal consonants may simply be due to other properties of the words containing these vowels - either the two nasal-initial ones (moat and mum), or their non-nasal counterparts (vote, goat, both, young, love, judge, hung, rough, bug, bum, hum, pup). More generally, differences across group averages can emerge in fitted or observed values regardless of whether the factors defining the grouping play a causal role in bringing about the differences.

If the pattern of greater vowel dispersion in 'hard' vs. 'easy' words was to be expected, given the combination of consonantal properties of the words in the two conditions, then the difference in vowel dispersion in 'easy' vs. 'hard' words should emerge in the

Fig. 5. Fitted values of models of F1 and F2 of vowels, averaged by voicing (left panel) and nasality (right panel) of the preceding consonant.

model predictions, as well. The right hand panel of Fig. 6 shows the model predictions for location of F1/F2 means in F1/F2 space for easy and hard words, analogous to Fig. 3 above. For convenience, Fig. 3 is repeated in the left hand panel of Fig. 6. It will be observed that the predicted values for several of the "hard" vowels (/i, a/) appear to be more peripheral than their 'easy' counterparts. For other vowels, the average fitted values of vowels in easy and hard words do not differ in the direction one would expect if vowel spaces are expanded in 'hard' words.

The conclusions in W2004 were not based on the plot reproduced as Fig. 1 above, but on comparisons of vowel dispersion values in easy vs. hard words containing each vowel. Do the predicted dispersion values differ in the direction observed in W2004? We calculated the "predicted dispersion", in the same manner as observed vowel dispersion, i.e. following the method in Bradlow et al. (1996), but using the fitted F1 and F2 values as the F1/F2 coordinates. The predicted dispersion values for each vowel are shown in the lower panel of Fig. 7, which is analogous to Figure 4.3 in W2004, reproduced as Fig. 2 above. To facilitate comparison to the observed dispersion values in our observed (re-measured) data, the observed dispersion values (averaged across the talker means for each vowel in each condition) are shown in the upper panel of Fig. 7. It will be observed that predicted dispersion is in fact higher for vowels in 'hard' words than for vowels in 'easy' words for several of the vowels. As mentioned above, W2004 found the difference between 'easy' and 'hard' words' dispersion "most reliable" in the point vowels /i, a, o, u/. As can be seen in the upper panel of Fig. 7, that difference is replicated in the observed dispersion values for each of those vowels, with the exception of/o/. The same pattern is also found in the predicted dispersion values (Fig. 7, lower panel). For the vowels /i, £, o, a/, W2004 found the easy/hard difference not to be reliable. As can be seen in Fig. 7, we in fact observed higher dispersion in hard vs. easy words with /£/ and /a/, a pattern that is also present in the predicted values (lower panel). For the remaining vowels other than the point vowels (/i/ and /o/),

Fig. 6. Observed (left panel) and predicted (right panel) F1/F2 means, for easy and hard words. Dark symbols indicate vowels in "hard" words. Light grey symbols indicate vowels in "easy" words.

aasoeeiiouA Fig. 7. Observed (top panel) and predicted (bottom panel) average vowel dispersion in 'easy' and 'hard' items for each vowel type.

neither the observed, nor the predicted values in easy vs. hard words differed clearly. In other words, both the presence and the absence of a difference between the 'easy' and 'hard' conditions is expected in certain vowel types, given the model predictions.

The average predicted dispersion values were 1.81 for the 'easy' words and 1.99 for the 'hard' words. An ANOVA of predicted dispersion, analogous to the ANOVA in W2004, i.e. using the pooled error term, reveals a significant effect of Difficulty (F(1,493) = 15.67, p<0.0001), in the direction observed in Wright (2004). As in W2004, there were also significant main effects of vowel type and a significant interaction between 'difficulty' and vowel type (F(9,493) = 3.70, p=0.0002). These results suggests that the pattern of greater dispersion in 'hard' words emerges in the F1/F2 coordinates predicted based on vowel type and phonetic context.

The easy/hard difference in dispersion is present in the observed data, and is therefore bound to be present to some degree in the fitted values of a model that succeeds in modelling that difference; the question is what the difference is due to. A case could be made that the analysis of model predictions should not take the random effects into account, as the talker-specific or vowel-specific variability captured by those effects does not necessarily reflect phonetic context. Since the hypothesis investigated here is that the easy vs. hard difference was due to phonetic context, rather than talker-specific variation, we carried out a second analysis of model predictions, this time only taking into account the fixed effects (i.e. the phonetic context variables). In that analysis, we focused not the fitted values in the full models, but on the parameters for the fixed effects in our models of F1 and F2, i.e. the estimates of the effects of phonetic context. As before, we used the resulting predicted F1 and F2 values to examine whether the predicted dispersion was higher for vowels in hard words than easy ones. This was the case.

3.5. The larger dataset: comparing by-hand and automatic formant measurements

The results presented so far are consistent with the hypothesis that vowel dispersion in easy vs. hard words in the data examined in W2004 reflects consonantal influences on vowel formants, rather than recognition difficulty. However, this situation could conceivably be due to a systematic confound between recognition difficulty and phonetic factors, either in the small dataset examined so far, or in the English lexicon as a whole: Phonotactic probability and phonological neighbourhood density are correlated (Vitevitch & Luce, 1999; Vitevitch, Luce, Pisoni, & Auer, 1999). More generally, dense vs. sparse neighbourhoods in the lexicon have different phonological characteristics (Frauenfelder, Baayen, Hellwig, & Schreuder, 1993; Kessler & Treiman, 1997).

The dataset examined in W2004 is too small to separate consonantal features and recognition difficulty. Luckily, a larger set of recordings from the same talkers is available, in the form of the Easy/Hard database (Torretta, 1995). The larger dataset allows us to ask whether similar relationships between consonantal features and F1 and F2 holds in the larger set as in the subset, and whether the F1/F2 coordinates one can expect, given those relationships, once again yield an apparent effect of recognition difficulty. If they do, then this is consistent with a scenario, raised by several anonymous reviewers, in which it is simply impossible to disentangle the effects of recognition difficulty and consonantal features.

Since by-hand measurements of the larger database were not feasible, we used automatic alignment and formant extraction in the analysis of the larger set. As a first step, we examined the automatic measurements of the same 600 tokens previously analysed by hand, to gauge whether the automatic measurements yielded results that were comparable to the by-hand measurements. It was found that the results were in fact similar. As in the by-hand measurements, the vowel [o] patterned in the opposite direction (2.76 for the 'easy' condition vs. 2.23 for the 'hard' condition) from that observed by W1997. As a reviewer points out, the vowels involved in the caught/cot distinction are in flux and may not be /o/-like in the speech of all talkers. The overall mean dispersion based on the automatic measurements of the words in the W2004 subset (1.74 for 'easy' words, 1.95 for 'hard' words') was lower than in the by-hand measurements in the current study (1.84 for 'easy' words, 2.08 for 'hard' words') and the means reported in W2004 (approximately 1.8 for the easy words and 2.3 for the hard words, according to Fig. 4.1 in W2004), but differed in the same direction. A likely reason for the lower dispersion values based on the automatic measurements is that the by-hand measurements were made at the point of maximal displacement, i.e. the point "when F1 and F2 are the most characteristic for that particular vowel" (Wright, 2004: 80). For vowels characterized by especially high or especially low F1 or F2 or F1/F2 ratios, that criterion often means that the measurement represents the extreme F1 and/or F2 for a given vowel token (except in cases where the points of maximal displacement for F1 and F2 did not coincide: "[w]here F1 and F2 were not in agreement, F1 was taken as the point of reference and F2 was measured at that point", W2004). As a result, the 'maximal displacement' criterion favors extreme distances from the center of vowel space as the point where measurements are taken. The automatic formant extraction reflects formant measurements at the temporal midpoint regardless of whether that point happens to be the point of maximal displacement and will therefore tend to reduce the mean values for classes of vowels that are most likely to contribute extreme points - vowels in the 'hard' words, assuming the effect observed in W1997 is present. Nevertheless, the overall pattern was similar: Dispersion was higher on average in 'hard' words compared to 'easy' words.

Applying the same ANOVA model as in the by-hand measurements, to retrace the analytic steps in W2004, we observed the same pattern of results as that reported in W2004: There was a significant main effect of difficulty, F(1,9) = 29.48, p<0.0001. As in W2004, there were also significant main effects of talker and vowel type, as well as a significant interaction between difficulty and vowel type (all p<0.0001). A mixed-effects regression model with the same random effects structure as for the by-hand measurements supports the same conclusion for the automatic measurements as for the by-hand measurements: When phonetic context is not taken into account, vowel dispersion is greater in 'hard' words compared to 'easy' words (beta = 0.58, t=4.45). We conclude that the automatic measurements yield broadly similar results as the by-hand measurements. With that intermediate result in hand, we turn to the analysis of the automatic measurements of F1 and F2 in the larger data set.

3.6. Models of F1 and F2 in the larger dataset

The full data set consists of three sets of recordings. Participants were asked to say words at three different speaking rates ("fast", "medium", and "slow"). Due to space limitations, we focus here on the analysis of the "medium" speaking rate, i.e. the condition analysed in W2004. We fitted models to the larger set of measurements, using the predictors that emerged as known determinants of F1 and F2 reviewed in Section 1.2 above.

The models are summarized in Appendix A. Inspection of the model residuals revealed departures from normality in the models of F1 and F2, particularly for very low and, to a lesser extent, very high formant ranges. It is possible that these values were poorly fitted because they resulted from errors in automatic formant tracking. Model residuals for models refitted to the portion of the data without nasal consonants came much closer to normal, suggesting that nasalized vowels were particularly vulnerable to formant tracking

The pattern of significant fixed effects was similar, though not the same as for the smaller data set: F1 was higher in female talkers than in male ones, higher in low vowels and lower in high vowels, compared to the reference level (mid vowels). The larger dataset, in which more combinations of vowel type and nasality of the final consonant were available, supported an interaction of Nasality and Vowel height, as follows: In mid and low vowels, but not in high vowels, F1 was lower before nasal consonants than before non-nasal consonants. This pattern of centralization along the F1 dimension matches previous observations about the effects of neighbouring nasal consonants on vowels (Beddor, 1983; Chen, 1997; Flemming, 2010a; Fourakis, 1991). There were also significant effects of the place of articulation of the final consonant: As in the smaller dataset, F1 was lower before alveolars, compared to before labials (the reference level). Unlike in the smaller dataset, F1 was also significantly lower before velars than before labials. Lowering of F1 near alveolars was observed previously for some vowels (Hillenbrand et al., 2001; Stevens & House, 1963; Strange et al., 2007). Given that this tendency was only observed for some vowels, but not others, one might expect a by-vowel random slope for Place of articulation to capture this pattern, but that random slope was not retained in the model. The squared correlation of fitted and observed values was .80.

The model specification of the initial model of F2 (before backward elimination) was also the same as in the model of the smaller dataset. The final model was quite similar to the final model of the subset. As expected, F2 was higher in female talkers than in male ones, and higher in front vowels than in central and back ones. Final alveolars were associated with increased F2 in central and back vowels, relative to the reference level (labials), a tendency that was attenuated in front vowels. (Front, central, and back) vowels followed by velars were associated with increased F2. The estimated effects of manner of articulation of the initial and final consonants were similar, with some differences resulting from the fact that the full dataset contained words with final approximants (excluded in W2004). Final fricatives and stops were associated with increased F2 in back and central vowels, a tendency that was reversed or attenuated in front vowels. There was also a significant interaction of initial and final voicing and vowel frontness, such that neighbouring voiceless consonants were associated with an increase in F2 in back and central vowels, and a decrease in F2 in front vowels. The squared correlation of fitted and observed values was .95.

Analogously to the follow-up analyses of F1 and F2 in the W2004 subset, we asked whether the models of F1 and F2 in the larger dataset could be improved by adding an interaction of Difficulty with Vowel height (in the F1 model) or Frontness (in the F2 model) as a predictor.

For F1, adding a Difficulty and the Difficulty x Height interaction failed to produce model improvement (AIC = 1732 with the interaction vs. 1716 without). The parameter estimates for the effect of Difficulty on high vowels indicated that high vowels were raised relative to the baseline level (i.e. mid vowels) in 'hard' words, but not in 'easy' words (fi=-0.19, t = -2.33), consistent what one would expect if talkers pronounced 'hard' words with more expanded vowel spaces along the F1 dimension. Low vowels in 'hard' words were not found to be lowered relative to low vowels in 'easy' words (fi=-0.02, t = -0.326).

For F2, adding the interaction of Difficulty with Frontness did result in model improvement by the AIC. The AIC for the model was 1670 without the interaction, and 1554 with the interaction. There was significant model improvement by a log likelihood criterion, as well (-2LL = 120.38, p(x2)<0.0001). However, neither the main effect of Difficulty, nor the estimate for the effect of Difficulty on back vs. front vowels gave any indication of any effect of Difficulty on vowel space expansion along the F2 dimension.

Taken together, the models of the larger dataset establish that it is possible to disentangle the effects of vowel dispersion and segmental context: The effects of segmental context matched previous findings in the literature. There was little evidence for any explanatory role of Difficulty. For F1, taking Difficulty into consideration did not result in model improvement. For F2, Difficulty yielded model improvement, but without indicating any effect of Difficulty on vowel space expansion.

3.7. Follow-up analyses and alternative modelling approaches

The stated purpose of W2004 was "to examine the degree to which factors in lexical competition that are known to affect intelligibility of individual words influence the carefulness with which talkers produce words" (W2004). That purpose was pursued in W2004 by asking whether dispersion was greater in 'hard' words than in 'easy' ones. Previous research (Flemming, 2010b; Lindblom, 1990) suggests a further way in which recognition difficulty might affect the phonetic realization of vowels: Talkers may resist or enhance coarticulatory processes if doing so helps ensure the intelligibility of 'hard' words. In other words, the degree of context-dependent articulatory undershoot might itself be a function of recognition difficulty. Scarborough (2013) suggests that the degree of nasal coarticulation varies with neighbourhood density in just this way. With respect to vowel dispersion, i.e. W2004's outcome variable, this proposal means that rather than increasing dispersion for 'hard' targets across the board, talkers might expend effort in

order to minimize target undershoot in 'hard' words, while allowing target undershoot in 'easy' words. By decreasing the amount of context-dependent undershoot in 'hard' words, talkers might keep these words from being even harder to understand. As a result, vowels in 'easy' words might tend to centralize in the vicinity of certain consonants, but vowels in 'hard' words need not.

For models of F1 and F2, this idea can be tested by asking if the various consonantal predictors of formant values produce significant interactions with Difficulty. If they do, and if the direction of the effects is consistent with vowels in 'hard' words being pronounced so as to maximize intelligibility, then this would provide evidence consistent with W2004's core claims.

We explored this idea by letting Difficulty interact with the significant predictors of F1 and F2, and with the features Height (for F1) and Frontness (for F2), respectively. Recall that there was significant interaction of Vowel height with Nasality of the final consonant in the larger dataset, such that high vowels were lowered before nasals, while mid and low vowels were raised, compared to vowels before oral consonants. If there is a tendency for vowels in 'hard' words to resist centralization processes, then the interaction of height and nasality might itself be conditioned on recognition difficulty, producing a three-way interaction of Height by Nasality by Difficulty.

The small size of the W2004 word list, with its incomplete crossing of several of the segmental variables with difficulty, precluded testing all relevant predictors in this manner. The final model of F1 after backward elimination included significant main effects of initial voicing, final nasality, and final place of articulation (cf. Table 6 above). Letting each of these interact with Height and Difficulty results in singularities. The only predictor for which it was possible to test the hypothesis outlined in this section was initial voicing. The model including the three-way interaction of Initial Voicing x Height x Difficulty was not preferable based on the AIC (AIC = 837 vs. 857 without the interaction). For F2, the only predictor that was significant in the final model of F2 (cf. Table 8) for which a three-way interaction could be tested was voicing of the final consonant. Adding the three-way interaction of Frontness x Final voicing x Difficulty failed to result in model improvement, either by the AIC or by a log likelihood criterion (AIC = 805 with the interaction, AIC = 802 without the interaction).

The larger dataset allowed us to explore the role of the easy/hard distinction in segmentally conditioned formant changes more fully. For F1, recall that there was significant interaction of Vowel height with Nasality of the final consonant, such that high vowels were lowered before nasals, while mid and low vowels were raised, compared to F1 before oral consonants. Similarly, there was a significant interaction of initial voicing with vowel height, such that low vowels were raised following voiced consonants. Adding three-way interaction of Vowel height x Final nasality x Difficulty and Vowel height x Initial voicing x Difficulty did not result in model improvement (AIC with the interactions = 1724, AIC without the interactions = 1716).

For F2, there were several predictors for which it was possible to evaluate the hypothesis that segmental effects on vowel formants might be conditioned on recognition difficulty. Four of these interactions yielded significant model improvement when added to the model of F2, but there was no clear pattern in the direction of the effects: In two cases (the interactions of Frontness x Difficulty with place of articulation and with voicing of the final consonants), vowels in 'hard' words were less peripheral, not more peripheral, than vowels in easy words in environments that affected vowel frontness. In two other cases (the interactions of Frontness x Difficulty x Voicing of the initial consonant), the pattern was consistent with the idea that vowels in 'hard' words might be more peripheral than vowels in 'easy' words in such environments. In sum, the pattern of non-significant interactions, and significant interactions consistent with vowels in 'hard' words being either more or less peripheral than vowels in 'easy' words offers little evidence supporting the scenario of recognition-conditioned coarticulation.

The analyses carried out so far accepted one of the premises of W2004, which was that the difference in dispersion of vowels in easy vs. hard words might reflect talkers' desire to maximize intelligibility, and that vowel centralization might hinder ease of recognition. The relationship between dispersion and intelligibility of vowels is complicated, a point that we take up in the discussion.

4. Discussion

The present set of analyses puts the oft-cited findings in W2004 in a new light. The findings reported in Wright (2004) are often taken as evidence in support of the hypothesis that talkers modify their speech so as to ensure its intelligibility. The current study demonstrates that the observed difference between vowels in 'easy' and 'hard' words in the data analysed in W2004 can be expected, given previous findings on the effect of consonantal environment on vowel formants: We fitted models of F1 and F2, using predictor variables grounded in previous phonetic investigations of the effects of neighbouring consonants on vowel formants. The factors that emerged as significant predictors of vowel formants in those models largely matched previous observations. Analysis of the fitted values of the phonetically-grounded models of F1 and F2 showed that the formant values predicted by those models separate vowels in "easy" and "hard" words in the manner observed in W2004. Given the role that W2004 has played in discussions not just of vowel dispersion, but of perceptual factors in pronunciation variation generally, it is important to consider the implications of the current set of analyses for that general discussion.

One possibility, suggested by an anonymous reviewer, is that the ability of segmental factors to predict similar patterns of variation in vowel spaces as recognition difficulty is due to a systematic correlation between recognition difficulty and segmental content. Phonotactic probability and phonological neighbourhood density are known to be correlated. The existence of the larger dataset of which W2004 formed a subset made it possible to explore this possibility further. We found that the consonantal factors were predictive of vowel formants in the larger dataset, whereas recognition difficulty was not. This suggests that it is possible to separate the effects of segmental context and recognition difficulty. Since segmental context is a significant predictor of vowel formants in the subset and the superset, whereas the easy/hard distinction is not predictive in the larger set, we take the two sets of results, taken

together, to mean that the observed higher dispersion of hard vs. easy words in the smaller subset is best explained by the segmental factors.

At one level, the present findings are far from surprising: It has long been known that consonantal context affects vowel formants, not just at vowel boundaries, but potentially throughout the vowel (Joos, 1948; Stevens & House, 1963). That being so, researchers investigating additional determinants of pronunciation variation seek to control for segmental context and other known predictors of pronunciation variation as a matter of course. When such factors are incompletely balanced, and not taken into consideration in the analysis, the resulting pronunciations cannot provide conclusive evidence for or against the role of other predictors.

At a different level, the results of the current study are surprising, given how frequently W2004 is cited as evidence establishing the role of auditory recognition difficulty, and listener-oriented factors more broadly, in pronunciation variation (Harrington, 2010) or a global pattern of uniform information density in speech (Aylett & Turk, 2006). The implications of the current study thus reach beyond the interpretation of W2004's results.

W2004 inspired a line of research into the role of auditory recognition difficulty in pronunciation variation. We believe that the word lists in some of those studies may be similarly flawed as the dataset in W2004. Some of the studies following up on W2004 (Kilanski, 2009; Stephenson, 2004), used word lists that overlapped with W2004's list or that similarly appear to contain segmental confounds. Munson (2007), for example, noted that vowel dispersion tended to be lower near alveolar consonants than near labial or velar consonants and sought to control for this tendency by keeping the number of alveolar contexts approximately equal across the four sets of words tested in that study. Munson (2007) reports that a chi-square test comparing the number of alveolars in the four groups to the distribution one might expect by chance was non-significant, seemingly indicating that the control was successful. However, the two groups of words that showed the clearest difference in dispersion were the 'low-frequency, high-density' and 'high-frequency, low-density' words, corresponding, respectively, to the 'hard' and the 'easy' conditions in the Easy/Hard database. In those two groups, the distribution of alveolars and other types of consonants was exactly reversed (14 alveolars and 6 non-alveolars in the 'easy' set, vs. 6 alveolars and 14 non-alveolars in the 'hard' set). This means that the pattern of lower dispersion in the 'easy' set may have arisen because of the greater number of alveolar contexts in that set, just as Munson (2007) anticipated and sought to eliminate as a confound. More generally, the current findings underscore the need to scrutinize the empirical basis of studies - including our own - attributing aspects of pronunciation variation to lexical factors without careful consideration of phonetic context.

One of the variables that contributed to recognition difficulty in W2004, phonological neighbourhood density, has played a pivotal role in studies of the structure of the mental lexicon and its effect on language production and comprehension (Chen & Mirman, 2012; Dell & Gordon, 2003; Luce & Pisoni, 1998; Vitevitch & Luce, 1998; Vitevitch et al., 1999). Phonological neighbourhood density has been shown to give rise to complex and seemingly contradictory effects on language processing: In auditory word recognition, similar-sounding words compete, which is why words in dense neighbourhoods (i.e. with many neighbours, and/or high-frequency neighbours) tend to be difficult targets for recognition. On the other hand, in spoken word production, words in dense neighbourhoods are retrieved more quickly and more accurately (Dell & Gordon, 2003; Vitevitch, 2002; Vitevitch & Sommers, 2003). That pattern bears out the predictions of models of lexical access and retrieval that involve feedback from shared phonological segments to the activation of lexical targets, as argued in Chen and Mirman (2012); Dell and Gordon (2003); Vitevitch and Sommers (2003).

From the point of view of models of language production involving joint activation of (lexical and sublexical) units and feedback from phonological to lexical forms, W2004 in its usual interpretation was unexpected. The patterns reported in Gahl et al. (2012), on the other hand, mesh well with those models: Gahl et al. found that, when other determinants of word duration and vowel dispersion were controlled, words with many neighbours were shorter and tended to be pronounced with more centralized vowels, than words with few neighbours. That pattern is expected if faster lexical retrieval is associated with phonetic reduction in connected speech - as is often assumed in explanations of the shortening and reduction of high-frequency and high-probability words (Bell et al., 2003, 2009; Gahl, 2008).

There are additional considerations that need to be taken into account in understanding lexical access and pronunciation variation in tasks probing the effects of lexical structure, such as auditory confusability vs. segment overlap, attentional demands, the presence of specific contrasting words within an experiment, and the spread of the phonological neighbourhoods (Baese-Berk & Goldrick, 2009; Strand & Sommers, 2011; Vitevitch, Armbruster, & Chu, 2004). Moreover, not all aspects of pronunciation variation are suitable means of tracking the role of perceptual factors. Nasal coarticulation, for example, provides valuable cues for the presence of neighbouring nasal consonants that listeners can and do exploit (Salverda, Dahan, & McQueen, 2003) - and at the same time, allowing nasal coarticulation may be less effortful for speakers, as well, perhaps particularly in the case of words that are retrieved relatively quickly. Thus, the degree of nasal coarticulation can in principle be explained both based on perceptual factors or based on feedback from phonological to lexical targets (Scarborough, 2013).

In fact, the same might be true for consonant-vowel coarticulation: Coarticulation results in cues that might aid word recognition (though not recognition of vowels in isolation), even when it results in vowel centralization. Therefore, the presence of coarticulatory effects in the data do not offer conclusive evidence distinguishing between explanations of pronunciation variation based on perceptual factors vs. phonological feedback in spoken word production.

For the sake of examining previous claims about vowel dispersion and recognition difficulty, we accepted the premise that talkers might increase dispersion to try to offset recognition difficulty of hard vs. easy words. That premise is grounded in work showing that talkers whose speech received high intelligibility ratings had larger F1/F2 spaces than talkers whose speech

received low intelligibility ratings (e.g. Bradlow et al., 1996), and that vowel space expansion is a feature of "clear speech" (cf. Uchanski, 2008). However, that observation does not entail predictions about specific vowel tokens or vowels in specific words: The presence of highly peripheral vowels in a speech sample will cause the observed vowel space to be large, even if other vowels in the sample are centralized.

The results of the current study in no way imply that there are no effects of phonological neighbourhood density on pronunciation, or on language production more generally. We believe that the null effect of recognition difficulty in the Easy/Hard database (when segmental context is controlled) is analogous to null effects of lexical frequency that have been observed in single-word naming tasks: There is no doubt that word frequency affects lexical access and retrieval in recognition, as well as in production, and that it affects pronunciation (Bell et al., 2009; Bybee, 2001; Pierrehumbert, 2001; Pluymaekers et al., 2005a; Pluymaekers, Ernestus, & Baayen, 2005b). But experimental tasks requiring talkers to say words one at a time are unreliable means of detecting the effects of word frequency: As we have pointed out before (Gahl, 2008), naming tasks in which words or short phrases are produced one at a time provide a highly unreliable means of detecting effects of lexical frequency on phonetic realization. In part, this is due to even pacing in word lists (Kello & Plaut, 2003); in part, it is due to effects of words processed during the experiment itself (Baayen, 2007; see e.g. Baayen, Wurm, & Aycock, 2007). These properties of single-word production tasks do not call the reality of lexical frequency effects into question.

Ultimately, a fuller understanding of effects of recognition difficulty and other predictors of pronunciation variation requires a more extensive word list than what is currently available in any study that we are aware of. With a more varied and more balanced set of words, many more interactions of segmental and other information could usefully be explored. However, longer word lists will not be sufficient for a fuller understanding of the role of segmental and lexical factors in pronunciation variation. Such an understanding also requires close attention to the task demands of a given context. One of the most serious limitations, in our view, of the available data does not concern the word list itself, but rather the absence of information on contextual factors such as the order of presentation, the inter-trial intervals, and the latencies between the presentation of each word and the onset of speech. Each of these factors has the potential to affect pronunciation. If pronunciation variation is to be a window on utterance planning and articulatory target variability, then databases need to include such contextual information.

5. Conclusions

The findings presented here should not be taken to mean that phonological neighbourhood density does not affect pronunciation, or that recognition difficulty never affects pronunciation. Just as the existence of clear speech phenomena does not constitute evidence in support of the claims made in W2004 and subsequent studies, the results of the current study do not, and are not intended to, call into question the existence of listener-oriented or clear-speech phenomena more broadly, or other instances of the role of perceptual considerations in speech. In our view, the question is not whether recognition difficulty (and other lexical factors) affect pronunciation, but under what circumstances they do so, what their effects are exactly, and what the mechanisms are that underlie the observed variation.

Acknowledgements

I am grateful to David Pisoni and Luis Hernandez for allowing me access to the recordings in the Easy/Hard database, to Ronald Sprouse for technical assistance, to Vanja Dukic for last-minute statistical advice, to Grace Neveu for research assistance, to Richard Wright, Ken deJong and several anonymous reviewers for their thoughtful comments, and to Keith Johnson for his comments and encouragement.

Appendix A

See Tables A1-A4.

Table A1

Fixed effects in a mixed-effects regression model of F1.

Effect Level Beta (SE) t

(Intercept) -0.836 (0.205) -4.079

Sex Female 1.025 (0.142) 7.197

Height [high] -1.246 (0.27) -4.619

[low] 1.555 (0.229) 6.797

Nasalfin [nasal] -0.008 (0.055) -0.143

Place(i„ [alveolar] -0.162 (0.038) -4.292

[velar] -0.06 (0.049) -1.21

Voiceinit [voiced] 0.1 (0.053) 1.897

Height x Nasalfin [high] [nasal] -0.088 (0.1) -0.877

[low] [nasal] -0.219 (0.075) -2.929

Height x Voiceinit [high] [voiced] -0.103 (0.084) -1.229

[low] [voiced] -0.202 (0.068) -2.977

Table A2

Random effects in a mixed-effects model of F1.

Random effect Variance SD

Vowel (intercept) 0.074 0.273

Talker (intercept) 0.091 0.302

Vowel height [+ high] 0.067 0.260

Vowel height [+low] 0.098 0.312

Residual 0.223 0.472

Table A3

Fixed effects in a mixed-effects model of F2, n = 1244.

Effect Level Beta (SE) t

(Intercept) -4.465 (0.320) -13.94

Sex Female 1.458 (0.215) 6.780

Frontness [front] 4.872 (0.431) 11.313

Mannerfin [fricative] 1.394 (0.091) 15.233

[stop] 1.55 (0.075) 20.745

Mannerinit [approx] -0.509 (0.070) -7.310

[fric] 0.157 (0.054) 2.881

[lat] -0.179 (0.100) -1.796

Places [alveolar] 0.332 (0.050) 6.635

[velar] 0.114 (0.074) 1.537

Voiceinit [voiced] 0.399 (0.053) 7.585

Voice)in [voiced] 0.445 (0.052) 8.578

Frontness x Mannerfin [front] [fric] -1.118 (0.130) -8.621

[front] [stop] -1.127 (0.106) -10.628

Frontness x Mannerinit [front] [approx] 0.089 (0.093) 0.959

[front] [fric] -0.447 (0.077) -5.805

[front] [lat] -0.189 (0.122) -1.556

Frontness x Placefin [front] [alv] -0.209 (0.074) -2.813

[front] [vel] -0.013 (0.098) -0.131

Frontness x Voiceinit [front] [voiced] -0.453 (0.074) -6.08

Frontness x Voiceinit [front] [voiced] -0.631 (0.067) -9.428

Table A4

Random effects in a mixed-effects model of F2, n =1244.

Random effect Variance SD

Vowel (intercept) 0.332 0.576

Sex [female] 0.160 0.400

Talker (intercept) 0.174 0.417

Frontness: [front] 0.248 0.499

Residual 0.202 0.449

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716-723.

Arnold, J. E., Tanenhaus, M. K., Altmann, R. J., & Fagnano, M. (2004). The old and thee, uh, new: Disfluency and reference resolution. Psychological Science, 15, 578-582. Aylett, M., & Turk, A. (2004). The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence and duration in spontaneous

speech. language and speech. Language and Speech, 47, 31-56. Aylett, M., & Turk, A. (2006). Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. Journal of the Acoustical Society of America, 119, 3048-3058.

Baayen, H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.

Baayen, H. (2007). Storage and computation in the mental lexicon. In G. Jarema, & G. Libben (Eds.), The mental lexicon: Core perspectives (pp. 81-104). Amsterdam: Elsevier. Baayen, H., Wurm, L. H., & Aycock, J. (2007). Lexical dynamics for low-frequency words: A regression study across tasks and modalities. The Mental Lexicon, 2, 419-463. Baese-Berk, M., & Goldrick, M. (2009). Mechanisms of interaction in speech production. Language and Cognitive Processes, 24, 527-554.

Balota, D. A., Boland, J. E., & Shields, L. W. (1989). Priming in pronunciation: Beyond pattern recognition and onset latency. Journal of Memory and Language, 28, 14-36. Balota, D. A., & Chumbley, J. I. (1985). The locus of word-frequency effects in the pronunciation task: Lexical access and/or production?. Journal of Memory & Language, 24, 89-106. Bard, E. G., Anderson, A. H., Sotillo, C., Aylett, M., Doherty-Sneddon, G., & Newlands, A. (2000). Controlling the intelligibility of referring expressions in dialogue. Journal of Memory & Language, 42, 1-22.

Barr, D., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects stucture for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255-278. Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-33. (http://CRAN.R-project.org/package=lme4). Beddor, P. S. (1983). Phonological and phonetic effects of nasalization on vowel height (unpublished dissertation). Bloomington: Indiana University.

Bell, A., Brenier, J., Gregory, M., Girand, C., & Jurafsky, D. (2009). Predictability effects on durations of content and function words in conversational English. Journal of Memory and Language, 60, 92-111.

Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M., & Gildea, D. (2003). Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America, 113, 1001-1024.

Bradlow, A. R., & Pisoni, D. B. (1999). Recognition of spoken words by native and non-native listeners: Talker-, listener-, and item-related factors. Journal of the Acoustical Society of America, 106, 2074-2085.

Bradlow, A. R., Torretta, G., & Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Communication, 20, 255-272.

Bybee, J. (2001). Phonology and language use. Cambridge: Cambridge University Press.

Byrd, D. (1994). Relations of sex and dialect to reduction. Speech Communication, 15, 39-54.

Chen, M. Y. (1997). Acoustic correlates of English and French nasalized vowels. Journal of the Acoustical Society of America, 102, 2360-2370.

Chen, Q., & Mirman, D. (2012). Competition and cooperation among similar representations: Toward a unified account of facilitative and inhibitory effects of lexical neighbors. Psychological Review, 119, 417^30.

Clopper, C. G., Pisoni, D. B., & de Jong, K. (2005). Acoustic characteristics of the vowel systems of six regional varieties of American English. Journal of the Acoustical Society of America, 118, 1661-1676.

Clopper, C. G., & Tamati, T. N. (2014). Effects of local lexical competition and regional dialect on vowel production. Journal of the Acoustical Society of America, 136, 1-4.

Dell, G. S., & Gordon, J. K. (2003). Neighbors in the lexicon: Friends or foes?. In N. O. Schiller, & A. S. Meyer (Eds.), Phonetics and phonology in language comprehension and production (pp. 9-47). New York: Mouton De Gruyter.

Ferguson, S. H. (2007). Talker differences in clear and conversational speech: Acoustic characteristics of vowels. Journal of Speech, Language, and Hearing Research, 50, 1241-1255.

Flemming, E. (2010a). The phonetics of schwa vowels. In D. Minova (Ed.), Phonological weakness in English (pp. 78-95). Houndsville: Palgrave Macmillan.

Flemming, E. (2010b). Modeling listeners: Comments on Pluymaekers et al. and Scarborough. In C. Fougeron, B. Kuhnert, M. D'Imperio, & N. Vallée (Eds.), Laboratory phonology, 10 (pp. 587-606). Berlin: Mouton De Gruyter.

Fourakis, M. (1991). Tempo, stress, and vowel reduction in American English. Journal of the Acoustical Society of America, 90, 1816-1827.

Frauenfelder, U. H., Baayen, H., Hellwig, F. M., & Schreuder, R. (1993). Neighborhood density and frequency across languages and modalities. Journal of Memory and Language, 32, 781 -804.

Gahl, S. (2008). "Time" and "thyme" are not homophones: Word durations in spontaneous speech. Language, 84, 474-496.

Gahl, S., Yao, Y., & Johnson, K. (2012). Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech. Journal of Memory and Language, 66, 789-806.

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambrige: Cambrige University Press.

Goldrick, M., Vaughn, C., & Murphy, A. (2013). The effects of lexical neighbors on stop consonant articulation. Journal of the Acoustical Society of America, 134, EL172-EL177.

Gordon, J. (2014). The aging neighborhood: Phonological density in naming. Language and Cognitive Processes, 29, 326-344.

Harrell, F. E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. New York: Springer.

Harrington, J. (2010). Acoustic phonetics. In J. Laver, & W. J. Hardcastle (Eds.), The handbook of phonetic sciences (2nd ed.). Oxford: Blackwell.

Hawkins, S. (2003). Roles and representations of systematic fine phonetic details in speech understanding. Journal of Phonetics, 31, 373-405.

Hawkins, S., & Slater, A. (1994). Spread of CV and V-to-V coarticulation in British English: Implications for the intelligibility of synthetic speech. ICSLP, 1, 57-60.

Hillenbrand, J. M., Clark, M. J., & Nearey, T. M. (2001). Effects of consonant environment on vowel formant patterns. Journal of the Acoustical Society of America, 109, 748-763.

Howes, D. (1957). On the relation between the intelligibility and frequency of occurrence of English words. Journal of the Acoustical Society of America, 29, 296-305.

Jescheniak, J. D., & Levelt, W. J. M. (1994). Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 824-843.

Joos, M. (1948). Acoustic phonetics. Baltimore: Linguistic Society of America.

Kello, C. T., & Plaut, D. C. (2003). Strategic control over rate of processing in word reading: A computational investigation. Journal of Memory and Language, 48, 207-232.

Kessler, B., & Treiman, R. (1997). Syllable structure and the distribution of phonemes in English syllables. Journal of Memory and Language, 37, 295-311.

Kilanski, K. (2009). The effects of token frequency and phonological neighborhood density on native and non-native speech production. Seattle: University of Washington.

Krause, J., & Braida, L. (2004). Acoustic properties of naturally produced clear speech at normal speaking rates. Journal of the Acoustical Society of America, 115, 362-378.

Labov, W., Ash, S., & Boberg, C. (2005). Atlas of North American English: Phonetics, phonology and sound change. Berlin, New York: Mouton De Gruyter.

Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA, US: The MIT Press.

Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22, 1-75.

Lindblom, B. (1983). Economy of speech gestures. In P. MacNeilage (Ed.), The production of speech (pp. 217-245). New York: Springer.

Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H&H theory. In W. J. Hardcastle, & A. Marchal (Eds.), Speech production and speech modeling (pp. 403-439). Dordrecht: Kluwer.

Luce, P. A., & Pisoni, D. B. (1987). Neigborhoods of words in the mental lexicon. Research on speech perception technical report No. 6. Bloomington, IN: Speech Research Laboratory, Indiana University.

Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear & Hearing, 19, 1-36.

Luce, P. A., Pisoni, D. B., & Goldinger, S. D. (1990). Similarity neighborhoods of spoken words. In G. T.M. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives (pp. 122-147). Cambridge, MA, US: The MIT Press.

Moon, S.-J., & Lindblom, B. (1994). Interaction between duration, context, and speaking style in English stressed vowels. Journal of the Acoustical Society of America, 96, 40-55.

Munson, B. (2007). Lexical access, lexical representation, and vowel production. In J. Cole, & J. I. Hualde (Eds.), Laboratory phonology 9: Phonology and phonetics (pp. 201-227). Berlin: Mouton de Gruyter.

Munson, B., & Solomon, N. P. (2004). The effect of phonological neighborhood density on vowel articulation. Speech, Language, and Hearing Research, 47, 1048-1058.

Neel, A. T. (2008). Vowel space characteristics and vowel identification accuracy. Journal of Speech, Language, and Hearing Research, 51, 574-585.

Nusbaum, H. C., Pisoni, D. B., & Davis, C. K. (1984). Sizing up the Hoosier mental lexicon: Measuring the familiarity of20,000 words. Research on Speech perception progress report, Vol. 10. Bloomington, IN: Psychology Department, Indiana University.

Oldfield, R. C., & Wingfield, A. (1965). Response latencies in naming objects. Quarterly Journal of Experimental Psychology, 17, 273-281.

Pierrehumbert, J. B. (2001). Exemplar dynamics: Word frequency, lenition and contrast. In Joan Bybee, & Paul Hopper (Eds.), Frequency and the emergence of linguistic structure. Typological studies in language, Vol. 45 (pp. 137-157). Amsterdam, Netherlands: John Benjamins Publishing Company.

Pierrehumbert, J. B. (2002). Word-specific phonetics. In C. Gussenhoven, & N. Warner (Eds.), Laboratory phonology, Vol. VII (pp. 101-140). Berlin: Mouton de Gruyter.

Pisoni, D. B., Nusbaum, H. C., Luce, P. A., & Slowiaczek, L. M. (1985). Speech perception, word recognition and the structure of the lexicon. Speech Communication, 4, 75-95.

Pluymaekers, M., Ernestus, M., & Baayen, H. (2005a). Articulatory planning is continuous and sensitive to informational redundancy. Phonetica, 62, 146-159.

Pluymaekers, M., Ernestus, M., & Baayen, H. (2005b). Lexical frequency and acoustic reduction in spoken Dutch. Journal of the Acoustical Society of America, 118, 2561-2569.

R Development Core Team. (2008). R: A language and environment fo statistical computing. Vienna, Austria. isbn:3-900051-07-0, (http://www.R-project.org).

Salverda, A. P., Dahan, D., & McQueen, J. M. (2003). The role of prosodic boundaries in the resolution of lexical embedding in speech comprehension. Cognition, 90, 51-89.

Scarborough, R. A. (2010). Lexical and contextual predictability: Confluent effects on the production of vowels. In L. Goldstein, D. H. Whalen, & C. T. Best (Eds.), Laboratory phonology, 10 (pp. 557-586). Berlin, New York: De Gruyter Mouton.

Scarborough, R. A. (2013). Neighborhood-conditioned patterns in phonetic detail: Relating coarticulation and hyperarticulation. Journal of Phonetics, 41, 491-508.

Stack, J., Strange, W., Jenkins, J. J., Clarke, W. D., III, & Trent, S. A. (2006). Perceptual invariance of coarticulated vowels over variations in speaking rate. Journal of the Acoustical Society of America, 119, 2394-2405.

Stephenson, L. (2004). Lexical frequency and neighbourhood density effects on vowel produciton in words and nonwords. In Proceedings of the 10th Australian international conference on speech science & technology (pp. 364-369). Macquarie University, Sydney.

Stevens, K. N., & House, A. S. (1963). Perturbation of vowel articulations by consonantal context: An acoustical study. Journal of Speech and Hearing Research, 6, 111-128.

Strand, J., & Sommers, M. (2011). Sizing up the competition: Quantifying the influence of the mental lexicon on auditory and visual spoken word recognition. Journal of the Acoustical Society of America, 130, 1663-1672.

Strange, W., Weber, A., Levy, E. S., Shafiro, V., & Hisagi, M. (2007). Acoustic variability within and across German, French, and American English vowels: Phonetic context effects. Journal of the Acoustical Society of America, 122, 1111-1129.

Thomas, E. (2001). An acoustic analysis of vowel variation in New World English. Durham, NC: Duke University Press.

Torretta, G. (1995). The "easy-hard" word multi-talker speech database: An initial report. Research on spoken language processing progress report, Vol. 20. Bloomington, IN: Speech Research Laboratory, Indiana University321-333.

Tremblay, A., & Ransijn, J. (2013). LMERConvenienceFunctions: A suite of functions to back-fit fixed effects and forward-fit random effects, as well as other miscellaneous functions (2.5 ed.).

Tunley, A. (1999). Coarticulatory influences of liquids on vowels in English. University of Cambridge.

Uchanski, R. (2008). Clear speech. In D. B. Pisoni, & R. E. Remez (Eds.), The handbook of speech perception (pp. 207-235). Malden, MA: Blackwell Publishers.

Ueda, Y., Hamakawa, T., Sakata, T., Hario, S., & Watanabe, A. (2007). A real-time formant tracker based on the inverse filter control method. Acoustical Science and Technology of the

Acoustical Science of Japan, 28, 271-274. Vinet, A. R. (1870). Homiletics; or, the theory of preaching (3rd ed.). New York: Ivison & Phinney.

Vitevitch, M. S. (2002). The influence of phonological similarity neighborhoods on speech production. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 735-747. Vitevitch, M. S., Armbruster, J., & Chu, S. (2004). Sublexical and lexical representations in speech production: Effects of phonotactic probability and onset density. Journal ofExperimental

Psychology: Learning, Memory, and Cognition, 30, 514-529. Vitevitch, M. S., & Luce, P. A. (1998). When words compete: Levels of processing in perception of spoken words. Psychological Science, 9, 325-329. Vitevitch, M. S., & Luce, P. A. (1999). Probabilistic phonotactics and neighborhood activation in spoken word recognition. Journal of Memory and Language, 40, 374-408. Vitevitch, M. S., Luce, P. A., Pisoni, D. B., & Auer, E. T. (1999). Phonotactics, neighborhood activation, and lexical access for spoken words. Brain and language, 68, 306-311. Vitevitch, M. S., & Sommers, M. S. (2003). The facilitative influence of phonological similarity and neighborhood frequency in speech production in younger and older adults. Memory & Cognition, 31, 491-504.

Watanabe, A. (2001). Formant estimation method using inverse filter control. IEEE Transactions on Speech and Audio Processing, 9, 317-326.

Wright, R. (1997). Lexical competition and reduction in speech: A preliminary report. Indiana University research on spoken language processing progress report no. 21 (pp. 471-485). Wright, R. (2004). Factors of lexical competition in vowel articulation. In J. Local, R. Ogden, & R. Temple (Eds.), Papers in laboratory phonology, VI (pp. 26-50). Cambridge: Cambridge University Press.

Yuan, J., & Liberman, M. (2008). Speaker identification on the SCOTUS corpus. In Proceedings of acoustics 2008, pp. 5687-5690.