Scholarly article on topic 'The effects of absolute pitch ability and musical training on lexical tone perception'

The effects of absolute pitch ability and musical training on lexical tone perception Academic research paper on "Psychology"

Share paper
Academic journal
Psychology of Music
OECD Field of science

Academic research paper on topic "The effects of absolute pitch ability and musical training on lexical tone perception"


Society for Education, Music

and psychology Research


The effects of absolute pitch ability and musical training on lexical tone perception

Psychology of Music 1-17

© The Author(s) 2014 Reprints and permissions: DOI: 10.1177/0305735614546359

Denis Burnham1, Ron Brooker2 3 and Amanda Reid1


The relationship between processing of speech and music was explored here via the linguistic vehicle of lexical tone. People with amusia have been found to be impaired on linguistic tasks; we examined whether absolute pitch (AP) possessors have an advantage on linguistic tasks. Participants were 3 groups of monolingual Australian-English speakers: non-AP musicians (musically-trained individuals who did not possess AP), AP musicians (musically-trained individuals who were AP possessors), and non-musicians (no musical training). Perceptual discrimination was tested in an AX same-different task for lexical tones presented in three contexts: normal Thai speech, low-pass filtered speech tones, and violin, with processing level manipulated via variation of the interstimulus interval (ISI). Non-musicians showed attenuated pitch discrimination of tones in speech, suggesting speech specialisation. On the other hand, all musicians showed greater accuracy, faster reaction times and less variation in accuracy across stimulus types than non-musicians. Importantly, AP musicians showed greater accuracy than non-AP musicians in the speech context, implying a domain-general advantage due to AP. However, speech-violin accuracy correlations for AP musicians were almost zero at the longer ISI, suggesting less commonality of mechanisms during more extensive processing. Results are discussed in terms of the role of AP in tone language perception.


absolute pitch, discrimination, musician, perception, tone language

Over the past decade, a growing body of research has emerged regarding language-music relations. Investigating the extent to which there is specificity or overlap in the cognitive

:MARCS Institute, University of Western Sydney, Australia 2Sydney Conservatorium of Music, University of Sydney, Australia 3School of Medicine, University of Western Sydney, Australia

Corresponding author:

Denis Burnham, University of Western Sydney, Locked Bag 1797, Penrith, Sydney, New South Wales 2751, Australia.


mechanisms involved in processing language and music can enhance our understanding of each domain. Language and music have obvious structural similarities: they are both "rule-based systems composed of basic elements (phonemes, words, notes and chords) that are combined into higher-order structures (musical phrases and sentences, themes and topics) through the rules of harmony and syntax" (Besson & Schon, 2001, p. 235). Some researchers (Peretz, 2012; Peretz & Coltheart, 2003) emphasise the separation and "modular" nature of language and music in the brain, while others (Patel, 2003, 2008) stress their overlap. In response to contradictory evidence regarding some aspects of language and music (e.g. syntax), Patel (2003, 2008, 2012a) has proposed a "resource sharing" framework, in which domain-specific representations for music and language are stored separately in long-term memory (LTM), but with deep connections in the cognitive processes that operate upon these. With regard to the focus of this article, pitch processing, Patel (2011, 2012b) states that it is likely that common brainstem circuits are involved in processing pitch in speech and music. Further, in his OPERA hypothesis Patel (2011, 2012b) proposes that musical training may drive subcortical pitch-encoding networks to function with higher precision than needed for ordinary speech processing, leading to benefits for speech processing because of the cognitive processes that music and speech share.

A valuable medium for exploring the nature of the cognitive and neural relations between pitch in speech and music is provided by tone languages (Zatorre & Gandour, 2008). In approximately 70% of the world's languages, which are spoken by more than half of the world's population, meaning is distinguished by changes in lexical tone (Fromkin, 1978; Yip, 2002). Thai for example, has five lexical tones; [kha:]-rising tone means "leg," [kha:]-falling tone means "to kill," [kha:]-high tone means "to trade," [kha:]-mid tone means "to be stuck," and [kha:]-low tone means "a kind of spice." Lexical tones are best described by fundamental frequency (F0) height, and contour over time, perceived as pitch level and pitch movement respectively. Other cues, e.g., voice quality and duration, contribute to the production and perception of lexical tone to a lesser extent; however it is F0 differences which are the main distinguishing feature of lexical tone and which are of major concern here.

In some circumstances, speech-to-music effects associated with tone language experience are evident (Bidelman, Gandour, & Krishnan, 2011; Bidelman, Hutka, & Moreno, 2013; Wong et al., 2012). Native Cantonese (tone) speakers show enhanced pitch perception compared to Canadian French and English (non-tone) speakers (Bidelman et al., 2013; Wong et al., 2012). Stevens, Keller, and Tyler (2013) reported that native Thai-speakers were significantly faster when discriminating musical contours and intervals than native English-speakers, although accuracy was equivalent. In contrast, Mandarin-, Vietnamese-, and Cantonese-speakers were more accurate at imitating musical pitch and discriminating intervals than English-speakers (Pfordresher & Brown, 2009), although there was no advantage in discrimination of individual notes. Bent, Bradlow, and Wright (2006) found no advantage for Mandarin compared with English listeners on a pitch discrimination task requiring fine-grained discrimination of simple non-speech sounds. However, Mandarin listeners had more difficulty than English-language listeners in identifying particular non-speech target sounds with certain falling and flat pitch contours, which could be due to interference from established Mandarin tone category boundaries. Thus there may be both positive and negative speech-to-music effects associated with tone language experience.

Turning to the inverse, music-to-speech effects, musical training or expertise has been found to be associated with improved pitch perception in tone language material (Alexander, Wong, & Bradlow, 2005; Cooper & Wang, 2010; Delogu, Lampis, & Olivetti Belardinelli, 2006, 2010; Gottfried & Xu, 2008; Lee & Hung, 2008; Lee & Lee, 2010; Re, Behne, & Wang, 2006; Wong, Skoe, Russo, Dees, & Kraus, 2007). For example, Alexander et al. (2005) found discrimination accuracy of Mandarin tones by English-speaking musicians was 87% - almost at the level of

Mandarin-speakers' accuracy (89%), compared with 71% accuracy by English-speaking non-musicians. Similar effects were found in reaction times. Wong, Skoe, Russo, Dees, and Kraus (2007) measured physiological responses to Mandarin tones for both amateur musicians and non-musicians, and found more robust and faithful pitch tracking among musicians, and a correlation between brainstem pitch tracking and amount of musical experience. In contrast, Hung and Lee (2008; Lee & Hung, 2008) found only weak correlations between musician performance on an absolute pitch task and accuracy in Mandarin tone identification, despite a musician advantage in the latter. Further, in a physiological study, Bidelman et al. (2011) found that while gross measures pointed to equivalent neural representation for Chinese tone-language speakers and musicians, fine-grained analyses showed that each group encoded particular elements of pitch patterns that were perceptually relevant to their domain of expertise.

The results of several recent studies of musical deficit, amusia (or tone-deafness), in both non-tone language and tone language-speakers support the case for domain-general pitch perception across music and speech (Jiang, Hamm, Lim, Kirk, & Yang, 2010; Nan, Sun, & Peretz, 2010; Tillmann, Burnham et al., 2011; Tillmann, Rusconi et al., 2011). Mandarin speakers with amusia have been found to be impaired compared with Mandarin controls both on melodic contour tasks, and also on Mandarin speech and non-linguistic filtered tone analogue tasks (Jiang et al., 2010). This shows that amusia extends to speech and other pitch-related tasks -evidence for domain-general pitch perception across music and speech, and also suggesting that tone language experience does not compensate for this supposedly musical deficit (but see also Wong et al., 2012).

Here we investigate whether Absolute Pitch, a specific form of pitch ability, is associated with improved pitch perception in tone language material. Absolute Pitch (AP) is traditionally defined as the ability to identify the chroma (pitch class) of a tone presented in isolation (e.g. middle C), or to produce a specified musical pitch without external reference (Levitin & Rogers, 2005; Parncutt & Levitin, 2001). It is different from Relative Pitch (RP), an ability that most people have that allows the identification or production of musical intervals or relations between pitches, and in fact AP may interfere with RP in some circumstances (Miyazaki, 1993). AP is rare; it occurs in only 1 in 10,000 people (Ward, 1999). One speculation for the genesis of AP is that the ability is latent and requires activation and training during a critical period (analogous to the critical period for language acquisition; Levitin, 1999), roughly between birth and 6 years of age (Cohen & Baird, 1990). Accordingly, the vast majority of AP possessors report musical training before their fifth or sixth year (Takeuchi & Hulse, 1993), the period of life when people are developing their native language. However, while possibly necessary, childhood musical experience is not sufficient for AP development, as the vast majority of children given early musical training do not develop AP (Plantinga & Trainor, 2005), and research suggests that complex genetic factors also contribute to its manifestation (Baharloo, Service, Risch, Gitschier, & Freimer, 2000; Theusch & Gitschier, 2011). It has been suggested that AP is probably manifested in a similar fashion to other labelling abilities in the developing child's vocabulary (Levitin & Rogers, 2005); the verbal labelling of pitches necessarily involves speech and language. However, Schellenberg and Trehub (2008) argue that it is important to distinguish pitch labelling (restricted to individuals with musical training) from pitch memory. Defined less stringently in this way, AP could in fact be relatively widespread, given that adults with little or no musical training can distinguish the original version of familiar instrumental recordings from those shifted upward or downward by one or two semitones (Schellenberg & Trehub, 2003). Nevertheless, here we use the traditional definition of AP reflecting pitch labelling.

Deutsch and colleagues argue that fluency in speaking a tone language exerts an influence on the predisposition to acquire AP (Deutsch, Dooley, Henthorn, & Head, 2009). Controlling for

gender and age of onset of musical training, Deutsch, Henthorn, Marvin, and Xu (2006) found greater prevalence of AP among Chinese music conservatory students compared with US music conservatory students. Although Schellenberg and Trehub (2008) did not find an association between tone language and pitch memory among Canadian 9-12-year-olds of Asian (Chinese) versus non-Asian (European) heritage, it remains possible that there may be a relationship between lexical tone experience and pitch labelling.

There is no research to our knowledge that has examined whether absolute pitch in the musical domain extends to perception of lexical tone. Tone perception is tested here using an AX discrimination task in three contexts - speech, low-pass filtered speech tones, and violin -with non-tone language speakers who are (a) non-AP musicians (defined as musically-trained individuals who do not possess AP), (b) AP musicians (musically-trained individuals who are AP possessors), and (c) non-musicians (with no musical training). If AP is a domain-general ability, then AP musicians should perform better than non-AP musicians in the speech context. Moreover, if there is greater commonality of speech-music processing in AP possessors than those without AP, we might expect higher speech-violin correlations in AP musicians than the other two groups. In order to examine the relationship between speech and violin tone discrimination under different processing conditions, we also used two inter-stimulus intervals (ISIs): 500 ms and 1500 ms. The AX task requires that information be held in echoic memory in order to compare two sequential acoustic patterns, so longer ISIs should allow more elaborate processing. Indeed, these two ISIs have been found to force different levels of processing of speech stimuli (Werker & Logan, 1985; Werker & Tees, 1984). Here, the use of these two ISIs allows investigation of whether processing level interacts with AP ability, musical training and stimulus context.



A total of 72 native English-speaking participants participated, each with little or no experience of other languages (1 or 2 years of high school maximum) and no experience at all with Thai or any other tone language. Of the 72, 48 were musicians (musically-trained individuals), selected from two music-training institutions in Sydney, Australia. Half (24) were musicians who did not possess AP (Non-AP musicians) and the other 24 were musicians who were AP possessors (AP musicians). Musicians identified as possible AP possessors by experienced Aural Perception academics at the two participating institutions were tested on their AP perception on measures devised by Watson (1995) at the Sydney Conservatorium of Music. All AP musicians included in the study achieved above 90% accuracy on three timed AP tasks - pitch naming, pitch producing, and musical key identification (M = 96%; SD = 3.8%). Those who were not AP possessors (defined as those who achieved 90% or below) were placed in the Non-AP musicians group. The remaining 24 participants were Introductory Psychology students with no musical training (Non-musicians).

Reflecting the sex ratio of the subject pool, the Non-musicians consisted of 19 females and 5 males (mean age = 26.3 years; range = 18.1-50.4). The Non-AP musicians consisted of 13 females and 11 males (mean age = 25.5 years; range = 18.4-48.8) and the slightly younger AP musician group consisted of 16 females and 8 males (mean age = 20.6 years; range = 14.1-32.5). Mean years of musical training for the Non-AP and AP musician groups were similar: 12.2 years (range 4-29) and 14.7 years (range 8-2 7), respectively. Mean hours of practice per week for the Musician and AP musician groups were 9.1 hours (range 2-20)

and 11.4 hours (range 0-24) respectively. Among the Non-AP musicians, 13 played piano, three played strings, six played woodwind, percussion, or brass, and two were vocalists. The AP musician group contained a relatively larger proportion of string players: 12 played piano, 10 played strings and the other two participants played woodwind or percussion.

The experiment was conducted at the University of NSW, the Sydney Conservatorium of Music, the University of Sydney, and the Australian Institute of Music.


Three stimulus sets were created, Speech, Filtered Tones, and Violin, each comprising three exemplars of each of the five Thai tones. The original speech stimuli were recorded from a female native Thai speaker speaking the syllable [pa:] carrying the five tones: rising [pa:], high [pa:], mid [pa:], low [pa:], falling [pa:], (R, H, M, L, F, respectively). Three of these (H, M, L) have been categorised as Static (or Level) tones and two (R, F) as Dynamic (or Contour) tones (Abramson, 1978). From these speech recordings, the three best exemplars (adequate volume, no false starts, no breaks in the voice, similar duration, etc.) of each of the five tones were chosen for the experiment. These 15 Speech stimuli were then used to create the 15 Filtered Tones and the 15 Violin stimuli.

The Filtered Tones stimuli were created by digitally low-pass filtering the speech sounds, to remove all frequencies above 270 Hz. This reduced the upper formant information while leaving the F0 intact, so that the phonological, syntactic, and semantic elements of speech were no longer present. Stimuli were filtered three times.

The Violin stimuli were created on a violin because the violin can both maintain a continuous sound and reproduce rapid pitch changes, e.g. the pitch dynamics of the Thai falling tone which covers approximately one and half octaves in a short space of time. A professional violinist listened extensively to the speech recordings and then reproduced approximately 25 exemplars of each of the five tones on the violin. From these, three violin exemplars for each of the five tones were selected, on the basis of comparison (using the Kay Elemetrics CSL analysis package) of the pitch plots of the original lexical tone and the violin sounds, with due regard to, and control of, sound duration. Across the 5 tones x 3 exemplars, the frequency range for speech was 138-227 Hz. In contrast, the frequency range for violin stimuli was higher: 293456 Hz (in musical terms, between about D4 and A4; "Middle C" is C4, 261 Hz and the lowest note on a violin is G3, 196 Hz). Although the sounds were not conventional musical notes, the sounds were recognisable as being produced by a violin.

Figure 1 shows the F0 tracks of corresponding speech, filtered tones and violin stimuli. It can be seen that the F0 contours in the violin stimuli here are not exactly the same as those in the speech stimuli. In this regard, the filtered tones condition is useful because the F0 contours are identical to those in the speech condition, while they presumably invoke fewer language-specific processes.


The experiment was conducted on a laptop computer and an in-house program, MAKEDIS, was used to control presentation and timing of the sounds and an attached response panel containing a "same" and a "different" key was used to record participants' responses and reaction times. A set of coloured feedback lights was used during the training phase.

1 0.8 0.6 0.4 0.2 0

1 0.8 0.6 0.4 0.2 0

1 0.8 0.6 0.4 0.2 0

1 0.8 0.6

0.4 0.2 0

1 0.8 0.6 ' 0.4 0.2 0


-■ Low21_Filtered -*-Low21_Violin

50 100 150 200 250 300 350 400 450 500 550

-Mid33_Speech Mid33_Filtered


50 100 150 200 250 300 350 400 450 500 550




50 100 150 200 250 300 350 400 450 500 550




50 100 150 200 250 300 350 400 450 500 550


-i Falling241_Filtered -*-FaNing241_Violm

50 100 150 200 250 300 350 400 450 500 550

Duration (ms)

Figure 1. F0 distribution of speech, filtered tones and violin stimuli on each Thai tone, shown with normalised pitch


Each participant completed three AX discrimination tasks, identical except for the stimulus type employed, Speech, Filtered Tones, or Violin. In each, the participant first listened to a 1-minute familiarisation "context" recording (a woman conversing in Thai, a concatenation of filtered tones excerpts, and a violin recording of Canon No.1 in Bach's Musical Offering,1 respectively). The familiarisation recordings were designed to differentiate between the three contexts - for example, the violin stimuli (which were isolated glides based on Thai tones), were intended to be processed in a more "music-like" than a "speech-like" manner, hence a violin music recording was used to help establish this perceptual set. Participants then completed a training phase, in which they were required to respond correctly on four simple auditory distinctions, two "same" and two "different" AX pairs of rag and rug [jffig, JAg]. Two 40-trial test blocks were then given for each stimulus type, with five of the possible 10 different contrast pairs presented in the first block, and the other five in the second block. Thus there were six 40-trial blocks altogether (3 stimulus types x 2 sets of 5 contrast pairs). The order of Stimulus Type blocks was counterbalanced between participants, as was the ordering of each 40-trial block within Stimulus Type. The 10 tone contrasts comprised three Static-Static comparisons (M-L, M-H, L-H), six Static-Dynamic comparisons (M-R, M-F, L-R, L-F, H-R, H-F), and one Dynamic-Dynamic comparison (R-F). For each contrast pair (e.g. L-F), each of the four possible stimulus x order combinations, AA, BB, AB, BA, (e.g., L-L, F-F, L-F, F-L) were presented twice. Thus in each block of 40, there were 20 "same" and 20 "different" trials. The actual exemplars of each tone on any particular trial were selected randomly by the computer from the pool of three possible exemplars, in order to discourage processing based on idiosyncratic acoustic properties. Participants were required to listen to stimulus pairs and respond by pressing either the "same" or "different" key within 1000 ms.

Two ISIs were used: 500 ms and 1500 ms. Half of the participants in each group were tested at the former, and the other half at the latter. The stimuli, apparatus and procedure are the same as in Burnham et al. (in press, Experiment 2), thus affording direct comparison of the results of that and this experiment.

A d prime score was calculated for each of the 10 tone pairs in each condition, given by d1 = Z(Hit rate) - Z(Falsepositive rate), with appropriate adjustments made for probabilities of 0 and 1. Hits are defined as the number of correct responses ("different" responses on AB or BA trials). False positives are defined as the number of incorrect responses ("different" responses on AA or BB trials).

Reaction times (RTs) were recorded from the onset of the second stimulus in an AX pair until the participant pressed either the "same" or "different" key. Only RTs for correct "different" responses were analysed, that is, "different" responses on trials in which the AX pair were indeed different, AB or BA trials.


The d1 and the RT data were analysed in separate 3 x 2 x (3) ANOVAs with the design: Group (AP Musicians, Non-AP Musicians, Non-musicians) x ISI (500, 1500 ms) x Stimulus Type (Speech, Filtered Tones, Violin), the last factor with repeated measures. It was hypothesised that Musicians (AP + Non-AP) would show an advantage in both d1 and RT over Non-musicians, and that AP musicians would show a further advantage over Non-AP musicians. It was expected, based on past studies showing musicians' over non-musicians' advantage with tone language material (Alexander et al., 2005; Cooper & Wang, 2010; Delogu et al., 2006, 2010;

Gottfried & Xu, 2008; Lee & Hung, 2008; Lee & Lee, 2010; R0 et al., 2006; Wong et al., 2007) that these advantages should apply over speech and non-speech material, and to test this, specific Stimulus Type contrasts were tested. Planned contrasts tested on the Group factor were: All Musicians (AP + Non-AP) vs. Non-musicians, and AP vs. Non-AP musicians; and on the Stimulus Type factor: Speech vs. Non-Speech (Filtered Tones + Violin), and Filtered Tones vs. Violin along with all 2- and 3-way interactions.

If there is greater commonality of speech-music processing in AP possessors than those without AP, we might expect higher speech-violin correlations in AP musicians than the other two groups.

Discrimination accuracy data

Mean d1 scores for the three groups on each of the three stimulus types are shown in Figure 2. There was no overall effect of ISI, and no significant interactions with ISI. The two Musician groups performed significantly better overall than the Non-musicians (M = 2.42), F(1, 66) = 21.18 p < .001, partial-n2 = .24, and the AP Musicians (M = 3.41) performed better than the Non-AP musicians (M = 2.89), F(1, 66) = 7.98, p = .006, partial-n2 = .12. With respect to Stimulus Type, scores were significantly higher overall for the two Non-Speech stimulus sets than Speech (M = 2.67), F(1, 66) = 24.61, p < .001, partial-n2 = .27 and higher for Violin (M = 3.15) than Filtered Tones (M = 2.91), F(1, 66) = 7.17, p =.009, partial-n2 = .10. Most interesting are two significant Group x Stimulus Type interactions of (i) All Musicians vs. Non-musicians with Speech/Non-Speech, F(1, 66) = 9.31, p = .003, partial-n2 = .12 and (ii) AP musician vs. Non-AP musician with Filtered/Violin, F(1, 66) = 6.52, p = .013, partial-n2 = .09. This complex of interactions is best understood as follows. As can be seen in Figure 2, Non-musicians' performance significantly improved as stimuli became less speech-like, from Speech to Filtered Tones, F(1, 23) = 8.68, p = .007, partial-n2 = .27, and Filtered Tones to Violin, F(1, 23) = 10.52, p = .004, partial-n2 = .31. This pattern was not observed for Musicians: for both the Non-AP musician (Speech vs. Filtered Tones, F(1, 23) = .007, p = .935; Filtered Tones vs.

Figure 2. Non-musicians', Non-AP musicians' and AP musicians' mean discrimination indices for speech, filtered tones and violin tone stimuli (with standard errors)

Table 1. Non-musicians', Non-AP musicians' and AP musicians' speech-violin accuracy (d) correlations and speech-violin reaction time correlations at ISIs of 500 and 1500 ms.

Speech-violin r Non-musician Non-AP musician AP musician

Accuracy 500 .80** .13 .38

1500 .41 .42 .07

RT 500 .73** .67* .85**

1500 .79** .57 .89***

*p < .05; **p < .01; ***p < .001.

Violin, F(1, 23) = 5.64, p = .026) and AP musician groups (Speech vs. Filtered Tones, F(1, 23) = 5.88, p = .024; Filtered Tones vs. Violin, F(1, 23) = .911, p = .350), there was less difference between the three stimulus types than for the Non-Musicians2. AP musicians were significantly more accurate than Non-AP musicians on Speech, F(1, 46) = 6.91, p = .012, and on Filtered Tones, F(1, 46) = 14.14, p < .001, but this superiority was ameliorated on Violin, F(1, 46) = 1.82, p = .184.3

Speech-Violin d1 correlations shown in the upper half of Table 1 can be seen to differ as an interactive function of ISI and Group.4 All correlations were positive, but the highest and only significant correlation was that for the Non-musician group at 500 ms, r =.80, p < .01. Most striking are the relatively low correlations at both ISIs for the Non-AP Musician and AP Musician groups. The AP Musician pattern is interesting given that despite highly accurate and almost equal discriminative ability for Speech and Violin sounds (with almost no change across ISI), the correlation at 1500 ms ISI was nearly zero. However, we note that the sample size in each ISI group is low (n = 12) and there may be a ceiling effect in accuracy for the two Musician groups, which would result in low correlations and which could limit interpretation of these correlations.

Reaction time data

Figure 3 a and 3b show mean RTs (in milliseconds) for correct "different" responses on AB trials for the 3 groups, the 3 stimulus types and at each ISI. Overall, all Musicians were significantly faster than Non-musicians by around 100 ms, F(1, 66) = 86.04, p < .001, partial-n2 = .5 7; but there was no significant difference between the Non-AP and AP musicians overall, F(1, 66) = 0.13, p = .717. In addition, there were ISI effects: a significant main effect of ISI, F(1, 66) = 4.58, p = .036, partial-n2 = .06, indicating that participants generally responded faster at 1500 ms ISI than at 500 ms ISI (M = 722 ms and M = 749 ms, respectively); and a significant AP vs. Non-AP musicians x ISI interaction, F(1, 66) = 16.27, p < .001, partial-n2 = .20. With respect to the latter Figure 3a and 3b show that, the AP musicians were more polarised in their responses than Non-AP musicians: they were much slower than Non-AP musicians at 500 ms ISI, F(1, 22) = 7.89, p = .010, and much faster to respond at 1500 ms ISI, F(1, 22) = 8.96, p = .0075, particularly for Violin as opposed to Filtered Tones, F(1, 66) = 15.96, p < .001, partial-n2 = .20. Figure 3a and 3b also show that responses were faster overall for Speech (M = 727ms) than Non-Speech (MFiltered = 742 ms; MViolin = 73 7 ms), F(1, 66) = 10.22, p = .002, partial-n2 = .13, although this was qualified by a significant interaction between Speech/Non-speech x ISI, F(1, 66) = 5.95, p = .017, partial-n2 = .08. Overall, Speech was significantly faster than Non-speech at 500 ms (M = 734 ms; M = 756 ms), F(1, 35) = 17.849, p < .001, partial-n2 = .338, but not at 1500 ms ISI (M = 720 ms; M = 723 ms), F(1, 35) = .253, p = .618.



(a) 500ms ISI

Non-AP Musicians

(b) 1500ms ISI

Non-AP Musicians

■ Speech □ Filtered Tones Violin

AP Musicians

■ Speech

□ Filtered Tones

□ Violin

AP Musicians

Figure 3. Figure 3a & 3b. Non-musicians', Non-AP musicians' and AP musicians' mean correct reaction times to "different" stimuli pairs for speech, filtered tones and violin tone stimuli (with standard errors) at ISIs of (a) 500 ms and (b) 1500 ms

At each ISI, speech-violin RT correlations were highest among the AP musician group (see lower half of Table 1); correlations were also similar at each ISI for both Non-AP musicians and Non-musicians.


It has been shown here that non-musician non-tone language speakers discriminate lexical tone frequency contrasts better in a violin than in a filtered tone context, and in turn better

than in a speech context. Burnham et al. (in press) obtained the same results for another group of non-musician non-tone language speakers and also showed that, in contrast, tone language-speakers performed generally better than non-tone language-speakers and were equally good at discriminating lexical tones in speech, filtered tones, and violin contexts. In parallel results here, it was also shown that trained musicians perform generally better than non-musicians at tone discrimination, with trained musicians with AP having an added advantage, and that neither group shows the graded response across contexts typical of non-tone language-speakers. Thus, it can be concluded that, as is the case for tone language experience (Burnham et al., in press ), tone discrimination accuracy is facilitated by (a) musical training experience (a result in accord with previous positive music-to-speech effects; Alexander et al., 2005; Cooper & Wang, 2010; Delogu et al., 2006, 2010; Gottfried & Xu, 2008; Lee & Hung, 2008; Lee & Lee, 2010; R0 et al., 2006; Wong et al., 2007), and (b) over and above this, absolute pitch ability.

These results can be taken as evidence for a specialised linguistic mode of pitch perception in non-tone language non-musicians. The superior performance by AP and non-AP musicians over non-musicians suggests that the attenuation of lexical pitch discrimination that normally occurs for non-tone language speakers in infancy (Mattock & Burnham, 2006; Mattock, Molnar, Polka, & Burnham, 2008) and carries through to adulthood (Burnham et al., in press) is either reduced or reversed as a function of musical training and AP ability. The fact that tone-language learning and musical experience have similar facilitative effects on the perception of lexical tones strongly suggests that speech and music processing occur in a common domain, but further research on the details of this facilitation is required before fine-grained conclusions are possible. For example, further investigation is required to determine whether non-tone language non-musicians are able to learn to discriminate lexical tone better on the basis of brief training, which might weigh against a strong form of the hypothesis for specialised speech processes for pitch perception.

The RT data show that musical training provides a speed advantage when detecting pitch differences - RTs for both groups of musicians were notably faster on all three stimulus types than the non-musicians. This cannot be explained by a simple speed-accuracy trade-off because both musician groups were also more accurate than non-musicians. It could be the case that the use of distinct interval-based categories in music - intervals that are based on specific F0 values for specific notes (e.g. Middle C = 261.626 Hz) - might facilitate fast assignment of consecutive pitches to same or different categories. It should be noted that it is possible that musicians are generally faster at processing auditory stimuli, and that this RT advantage may not be specific to pitch tasks.

The findings here extend previous research on musical training and prosody in non-tone languages, which has found that professional French musicians were not only better at detecting weak pitch variations (final word F0 in a sentence) in non-native Portuguese speech, they were also faster than non-musician French participants (Marques, Moreno, Castro, & Besson, 2007). In summary, both the accuracy and the RT data for AP and non-AP musicians suggest some cross-modal transfer, some common processing mechanisms or domain-generality of speech and music with regard to pitch.

Within the musician groups, the AP musicians showed greater accuracy overall than musicians without AP, although the advantage was diminished for violin stimuli. The augmented accuracy for lexical tones in speech due to AP over and above musical training has not been observed before. Additionally, the fact that the AP musicians performed well across all stimulus contexts suggests that AP may be a domain-general phenomenon. Although AP is usually considered to be a pitch-related ability specific to musical sounds, such a domain-general conclusion is consistent with results for what could be considered to be the opposite of AP, amusia:

Tillmann, Burnham et al. (2011) found that amusics performed significantly worse than controls on Mandarin and Thai lexical tone discrimination, as well as for musical analogues (equivalent to the violin stimuli here).

AP musicians also had a speed (RT) advantage over non-AP musicians, but only at the 1500 ms ISI at which deeper processing is presumably elicited, and not at 500 ms ISI, at which processing may be shallower. This faster reaction at 1500 ms by AP over non-AP musicians is consistent with the view that AP involves long-term pitch memory that draws on internal pitch standards (a template) to identify tones (Parncutt & Levitin, 2001).

The speech-violin correlations highlight the possible involvement of subtle task conditions such as ISI when investigating modularity. The significant d1 correlation at the shorter ISI for non-musicians suggests that at a shallow early level of processing, domain-general mechanisms operate in the perception of pitch in all contexts - speech, music, and filtered tones. A similar experiment conducted in our laboratory with a much larger sample size of non-musicians (Burnham et al., 1996; Burnham et al., in press) also produced a large drop in speech-violin correlations at the longer ISI, supporting the results found here.

The AP musician d1 correlation was almost zero at the longer ISI, implying less commonality of speech and music mechanisms at deeper processing levels that are presumably associated with greater involvement of long-term memory (LTM) and greater use of categorical stores for auditory information. This low correlation is prima facie inconsistent with the fact that these AP musicians discriminated sounds well regardless of stimulus type; the former suggests reduced commonality for speech and music processing, whereas the latter implies some kind of domain-generality or cross-modal transfer. This is not what we predicted, and is similar to the interpretation issue faced by Jiang et al. (2010) when discussing amusia - they concluded that there was evidence of domain-generality, yet the speech-music correlation for people with amusia was low and non-significant (r = .19). These conflicting results may suggest the establishment of domain-general learning mechanisms in development as categories are created, or domaingeneral cognitive processing resources that operate on each category, but that at the long-term memory (LTM) level, true commonality of mechanisms (domain-generality) does not exist.

In this experiment, multiple tone exemplars by a single speaker were used to encourage phonetic processing rather than processing based on slight acoustic differences. However, the AP musicians picked up and commented on these slight differences in exemplars on "same" trials (while still understanding the task). This might have occurred even more had multiple-speaker stimuli been employed. In this light, it is interesting that performance on an absolute pitch task was found to be un correlated with performance of Mandarin-speaking musicians (72% of whom were AP possessors) on a lexical tone identification task with multiple-speaker stimuli (Lee & Lee, 2010). While Deutsch, Henthorn, and Dolson (2004) have suggested that tone language-speakers employ AP in speech processing, the above results suggest that AP may in fact be injurious to real-world lexical tone perception, in which correct perception of single lexical tones depends upon the pitches of surrounding tones as well as each speaker's individual pitch range. Schellenberg and Trehub (2008) highlight that

...the perceptual template for lexical tones must be flexible, much like the template for recognising the same tune at different pitch levels. Tone-language speakers may use some form of absolute processing for producing tones and relative processing for decoding the tones of other speakers. (p. 242)

Indeed, evidence is emerging that relative pitch ability is higher among non-musician East Asians than Caucasians (Hove, Sutherland, & Krumhansl, 2010), a phenomenon that may have genetic origins (Dediu & Ladd, 2007), and which could facilitate acquisition of

tone languages. So it is possible that trained musicians without AP (but who would have good relative pitch), may perform better than AP musicians when perceiving real-world tone stimuli. To test this, future tone-discrimination studies should be conducted using stimuli in which the variability within tone categories is systematically manipulated via different numbers and types (male/female) of speakers. Relative Pitch ability was not assessed here, but further studies should include this variable in order to rule out any systematic differences between AP musicians and non-AP musicians.6

The exact nature of musical experience may be relevant to consider in future experiments; R0 et al. (2006) found lower accuracy on native Norwegian speech stimuli for singers compared to instrumentalists, although both showed higher accuracy compared with non-musicians on Mandarin linguistic and hummed tones. As Lee and Lee (2010) also found that piano stimuli tended to produce higher identification accuracy than viola stimuli, future research could take both these issues into account. It is of possible relevance here that there was a relatively large proportion of string players in the AP musician-group compared with the non-AP musicians, as string players may be more accustomed to fine modulations of pitch on their instruments.

In this experiment, we used a short familiarisation "context" recording before each stimulus type (speech, filtered tones, or violin) was tested. We cannot say here whether this had any effect in terms of prompting a musical perceptual set for violin stimuli, however, there are two related types of experiments that may prove fruitful in future research. First, perceptual settype experiments could be conducted in which participants are presented with a series of discrimination trials with speech stimulus tone pairs, with occasional music tone pair probe trials interspersed at random intervals. If speech trials set in place a lexically-based strategy for processing incoming sounds, and this is then applied to incoming music stimuli, then in the music probe trials, non-musician non-tone language listeners should show the same "tone deafness" in music as they do in speech. Conversely, if music stimulus tone pairs are the norm, then probe speech trials should reveal whether (a) a music-based strategy is induced that overrides the native language bias such that tone in speech is better after this musical familiarisation or (b) the native language bias is so strong that it overrides any familiarisation induced music-based strategy. Second, and in a similar vein, in order to investigate the temporal parameters of any effect of speech on music (or vice versa), stimuli presented in mixed AX pairs (music-speech, or speech-music) could be used in the same discrimination paradigm as used here, including the ISI variations here, 500 ms, and 1500 ms. Again, the results would have implications for the issues of "tone deafness" induced by linguistic bias (in speech-music pairs) and robustness of speech bias in music-speech pairs, this time at a more micro-level temporal level.

Finally, in the examination of speech-music relations here, we have only examined one aspect of music, i.e., pitch. Therefore, the results cannot shed light on overlap or independence in processing of temporal (rhythm, tempo, meter) or other melodic and harmonic qualities of speech and music. Further, only one aspect of pitch in speech was examined - pitch on individual syllables rather than prosody, or sentential stress. Further research is required to examine more global pitch patterns (intonation) in sentences including both linguistic and emotional prosody (Hirst & Di Cristo, 1998; Thompson, Schellenberg, & Husain, 2004) in both non-tone and tone languages, and non-tone and tone language-speakers.

In summary, non-tone language non-musicians have attenuated discrimination of pitch differences in lexical tones, indicative of language-specific speech specialisation (see also Burnham et al., in press). However, musical training appears to immunise against or compensate for this specialisation such individuals have more accurate and faster tone discrimination overall, and do not show the same attenuation of pitch perception in lexical contexts as their non-musician counterparts. Over and above musical training, musicians who possess absolute pitch have a

further accuracy advantage, suggesting that AP may be domain-general, and not restricted to the musical modality. The results of this study show that musical training and absolute pitch ability are associated with speech perception in various intricate ways - just how and when this association occurs in development awaits further investigation.


Results of this experiment were first presented at a conference by Burnham and Brooker (2002). We also thank Barbara Tillmann and Peter Keller for their comments on previous drafts of this article.


This work was supported by an ARC Discovery grant to the first author (Burnham, D., Kuratate, T., McBride-Chang, C., Mattock, K., DP0988201); and ARC Large grants to the first author (A00001283, A79601993).

1. The violin familiarisation recording was Canon No.1 (a 2 cancrizans) in Bach's Musical Offering (BWV 1079), recorded by Dene Olding to demonstrate violin for an ABC radio broadcast "The Science of Music: Strings", presented by Joe Wolfe.

2. An alpha level of .05/6 = .008 was used.

3. An alpha level of .05/3 = .0167 was used.

4. Note that to reduce the array of correlations reported, correlations only include Speech and Violin, not Filtered Tones, as these are the extreme points on the Speech-Non-Speech continuum.

5. An alpha level of .05/2 = .025 was used.

6. Care must also be taken to identify participants with Quasi-AP; Quasi-AP is said to be less spontaneous than "true" AP and responses are slower, probably involving reference to a single learned reference pitch (see Takeuchi & Hulse, 1993). Here, all of the sub-tests in the Watson (1995) AP test were timed, so this would have distinguished between "true" AP possessors and Quasi-AP possessors. In this study, the focus is on "true" AP possessors, and the fact that we do find group differences in discrimination accuracy between the two musician groups underlines the relevance of the group split.


Abramson, A. S. (19 78). Static and dynamic acoustic cues in distinctive tones. Language and Speech, 67, 319-325

Alexander, J. A., Wong, P. C. M., & Bradlow, A. R. (2005, September). Lexical tone perception in musicians and non-musicians. Paper presented at the Proceedings of Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology, Lisbon, Portugal. Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000). Familial aggregation of absolute pitch. American Journal of Human Genetics, 67, 755-758. Bidelman, G. M., Gandour, J. T., & Krishnan, A. (2011). Cross-domain effects of music and language experience on the representation of pitch in the human auditory brainstem. Journal of Cognitive Neuroscience, 23(2), 425-434. Bidelman, G., Hutka, S., & Moreno, S. (2013). Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: Evidence for bidirectionality between the domains of language and music. Plos One, 8, e60676. Bent, T., Bradlow, A. R., & Wright, B. A. (2006). The influence of linguistic experience on pitch perception in speech and non-speech sounds. Journal of Experimental Psychology: Human Perception and Performance, 32(1), 9 7-103. Besson, M., & Schon, D. (2001). Comparison between language and music. Annals New York Academy of Sciences, 930, 232-258.

Burnham, D. K., & Brooker, R. (2002, September). Absolute pitch and lexical tones: Tone perception by non-musician, musician, and absolute pitch non-tonal language speakers. In Proceedings of the 7th International Conference on Spoken Language Processing, Denver, USA, 257-260.

Burnham, D., Francis, E., Webster, D., Luksaneeyanawin, S., Attapaiboon, C., Lacerda, F., & Keller, P. (1996, October). Perception of lexical tone across languages: Evidence for a linguistic mode of processing. In T. Bunnell & W. Isardi (Eds) Proceedings of the 4th International Conference on Spoken Language Processing, Vol. 1, 2514-17.

Burnham, D., Kasisopa, B., Reid, A., Luksaneeyanawin, S., Lacerda, F., Attina, V.....Webster, D. (in

press). Universality and language-specific experience in the perception of lexical tone and pitch. Applied Psycholinguistics.

Cohen, D., & Baird, K. (1990). Acquisition of absolute pitch: The question of critical periods, Psychomusicology, 9, 31-3 7.

Cooper, A., & Wang, Y. (2010, May). The role of musical experience in Cantonese lexical tone perception by native speakers of Thai. Proceedings of the 5 th International Conference on Speech Prosody, Chicago, IL.

Dediu, D., & Ladd, D. R. (2007). Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin. Proceedings of the National Academy of Sciences, 104, 10944-10949.

Delogu, F., Lampis, G., & Olivetti Belardinelli, M. (2006). Music-to-language transfer effect: May melodic ability improve learning of tonal languages by native nontonal speakers? Cognitive Processing, 7(3), 203-207.

Delogu, F., Lampis, G., & Olivetti Belardinelli, M. (2010). From melody to lexical tone: Musical ability enhances specific aspects of foreign language perception. European Journal of Cognitive Psychology, 22, 46-61.

Deutsch, D., Dooley, K., Henthorn, T., & Head, B. (2009). Absolute pitch among students in an American music conservatory: Association with tone language fluency. Journal of the Acoustical Society of America, 125(4), 2398-2403.

Deutsch, D., Henthorn, T., & Dolson, M. (1999, November). Tone Language Speakers Possess Absolute Pitch. Popular version of paper 4pPP5, presented at the 138th Meeting of the Acoustical Society of America, Columbus, OH. Retrieved from

Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. S. (2006). Absolute pitch among American and Chinese conservatory students: Prevalence differences, and evidence for a speech-related critical period (L). Journal ofthe Acoustical Society of America, 119(2), 719-722.

Fromkin, V. (Ed.). (19 78). Tone: A linguistic survey. New York, NY: Academic Press.

Gottfried, T. L., & Xu, Y. (2008, June-July). Effect of musical experience on Mandarin tone and vowel discrimination and imitation. Paper presented at the Acoustics 08, Paris, France.

Hirst, D., & Di Cristo, A. (Eds.). (1998). Intonation systems. A survey of twenty languages. Cambridge, UK: Cambridge University Press.

Hove, M. J., Sutherland, M. E., & Krumhansl, C. L. (2010). Ethnicity effects in relative pitch. Psychonomic Bulletin & Review, 17, 310-316.

Hung, T.-H., & Lee, C.-Y. (2008, April). Processing linguistic and musical pitch by English-speaking musicians and non-musicians. Paper presented at the 20th North American Conference on Chinese Linguistics (NACCL-20), Columbus, Ohio.

Jiang, C., Hamm, J. P., Lim, V. K., Kirk, I. J., & Yang, Y. (2010). Processing melodic contour and speech intonation in congenital amusics with Mandarin Chinese. Neuropsychologia, 48(9), 2630-2639.

Lee, C. Y., & Hung, T. H. (2008). Identification of Mandarin tones by English-speaking musicians and nonmusicians. Journal of the Acoustical Society of America, 124(5), 3235-3248.

Lee, C. Y., & Lee, Y. F. (2010). Perception of musical pitch and lexical tones by Mandarin-speaking musicians. Journal of the Acoustical Society of America, 127(1), 481-490.

Levitin, D. (1999). Absolute pitch: Self-reference and human memory. International Journal of Computing and Anticipatory Systems, 4, 255-66.

Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: Perception, coding, and controversies. Trends in Cognitive Sciences, 9(1), 26-33.

Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a foreign language better than nonmusicians: Behavioral and electrophysiological evidence. Journal of Cognitive Neuroscience, 19, 1453-1463

Mattock, K., & Burnham, D. (2006). Chinese and English infants' tone perception: Evidence for perceptual reorganization. Infancy, 10(3), 241-265.

Mattock, M., Molnar, M., Polka, L., & Burnham, D. (2008). The developmental course of lexical tone perception in the first year of life. Cognition, 106, 1367-1381.

Miyazaki, K. (1993). Absolute pitch as an inability: Identification of musical intervals in a tonal context. Music Perception, 11, 55-71.

Nan, Y., Sun, Y., & Peretz, I. (2010). Congenital amusia in speakers of a tone language: Association with lexical tone agnosia. Brain, 133(9), 2635-2642.

Parncutt, R., & Levitin, D. (2001). Absolute pitch. In S. Sadie (Ed.), The New Grove dictionary of music and musicians (pp. 3 7-39). London, UK: Macmillan.

Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6(7), 674-681.

Patel, A. D. (2008). Music, language and the brain. New York, NY: Oxford University Press.

Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology, 2, 142.

Patel, A. D. (2012a). Language, music, and the brain: A resource sharing framework. In P. Rebuschat, M. Rohrmeier, J. Hawkins, & I. Cross (Eds.), Language and Music as Cognitive Systems (pp. 204-223). Oxford: Oxford University Press.

Patel, A. D. (2012b). The OPERA hypothesis: Assumptions and clarifications. Annals of the New York Academy of Sciences, 1252, 124-128.

Peretz, I. (2012). Music, language and modularity in action. In P. Rebuschat, M. Rohrmeier, J. Hawkins, & I. Cross (Eds.), Language and music as cognitive systems (pp. 254-268). Oxford: Oxford University Press.

Peretz, I., & Coltheart, M. (2003). Modularity of musical processing. Nature Neuroscience, 6, 688-691.

Pfordresher, P. Q., & Brown, S. (2009). Enhanced production and perception of musical pitch in tone language speakers. Attention Perception & Psychophysics, 71(6), 1385-1398.

Plantinga, J., & Trainor, L. J. (2005). Memory for melody: Infants use a relative pitch code. Cognition, 98(1), 1-11.

R0, M. H., Behne, D., & Wang, Y. (2006). The effects of musical experience on linguistic pitch perception: A comparison of Norwegian professional singers and instrumentalists. Journal of the Acoustical Society of America, 120, 3168 (Abstract).

Schellenberg, E. G., & Trehub, S. E. (2008). Is there an Asian advantage for pitch memory? Music Perception, 25(3), 241-252.

Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychological Science, 14, 262-266.

Stevens, C. J., Keller, P. E., & Tyler, M. D. (2013). Tonal language background and detecting pitch contour in spoken and musical items. Psychology of Music, 41, 59-74.

Takeuchi, A., & Hulse, S. (1993). Absolute pitch. Psychological Bulletin, 113, 345-361.

Theusch, E., & Gitschier, J. (2011). Absolute pitch twin study and segregation analysis. Twin Research and Human Genetics, 14, 173-178.

Thompson, W. F., Schellberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music lessons help? Emotion, 4, 46-64.

Tillmann, B., Burnham, D., Nguyen, S., Grimault, N., Gosselin, N., & Peretz, I. (2011). Congenital amu-sia (or tone-deaf-ness) interferes with pitch processing in tone languages. Frontiers in Psychology, 2, 120. doi:10.3389/fpsyg.2011.00120

Tillmann, B., Rusconi, E., Traube, C., Butterworth, B., Umilta, C., & Peretz, I. (2011). Fine-grained pitch processing of music and speech in congenital amusia. The Journal of the Acoustical Society of America, 130(6), 4089-4096.

Ward, W. D. (1999). Absolute pitch. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 265-298). San Diego, CA: Academic Press.

Watson, C. (1995). Absolute pitch. Honours thesis, Sydney Conservatorium of Music, University of Sydney.

Werker, J. F., & Logan, J. S. (1985). Cross-language evidence for three factors in speech perception. Perception & Psychophysics, 37, 35-44.

Werker, J. F., & Tees, R. C. (1984). Phonemic and phonetic factors in adult cross-language speech perception. Journal of the Acoustical Society of America, 75, 1866-1878.

Wong, P. C. M., Ciocca, V., Chan, A. H. D., Ha, L. Y. Y., Tan, L., & Peretz, I. (2012). Effects of culture on musical pitch perception. Plos One, 7, e33424.

Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10(4), 420-422.

Yip, M. (2002). Tone. Cambridge, UK: Cambridge University Press.

Zatorre, R., & Gandour, J. T. (2008). Neural specializations for speech and pitch: Moving beyond the dichotomies. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, 363(1493), 1087-1104.