Scholarly article on topic 'Decoding visual object categories from temporal correlations of ECoG signals'

Decoding visual object categories from temporal correlations of ECoG signals Academic research paper on "Medical engineering"

CC BY
0
0
Share paper
Academic journal
NeuroImage
OECD Field of science
Keywords
{ECoG / "IT cortex" / "Object category" / Decoding / "Temporal coding"}

Abstract of research paper on Medical engineering, author of scientific article — Kei Majima, Takeshi Matsuo, Keisuke Kawasaki, Kensuke Kawai, Nobuhito Saito, et al.

Abstract How visual object categories are represented in the brain is one of the key questions in neuroscience. Studies on low-level visual features have shown that relative timings or phases of neural activity between multiple brain locations encode information. However, whether such temporal patterns of neural activity are used in the representation of visual objects is unknown. Here, we examined whether and how visual object categories could be predicted (or decoded) from temporal patterns of electrocorticographic (ECoG) signals from the temporal cortex in five patients with epilepsy. We used temporal correlations between electrodes as input features, and compared the decoding performance with features defined by spectral power and phase from individual electrodes. While using power or phase alone, the decoding accuracy was significantly better than chance, correlations alone or those combined with power outperformed other features. Decoding performance with correlations was degraded by shuffling the order of trials of the same category in each electrode, indicating that the relative time series between electrodes in each trial is critical. Analysis using a sliding time window revealed that decoding performance with correlations began to rise earlier than that with power. This earlier increase in performance was replicated by a model using phase differences to encode categories. These results suggest that activity patterns arising from interactions between multiple neuronal units carry additional information on visual object categories.

Academic research paper on topic "Decoding visual object categories from temporal correlations of ECoG signals"

ELSEVIER

Contents lists available at ScienceDirect

NeuroImage

journal homepage: www.elsevier.com/locate/ynimg

Decoding visual object categories from temporal correlations of ECoG signals^

Kei Majima a,b, Takeshi Matsuoc,d, Keisuke Kawasakic, Kensuke Kawaid, Nobuhito Saito d Isao Hasegawac,e, Yukiyasu Kamitani ^^

a ATR Computational Neuroscience Laboratories, 2-2-2 Keihanna Science City, Kyoto 619-0288, Japan

b Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan c Department of Physiology, Niigata University School of Medicine, 1-757 Asahi-machi St., Chuo-ku, Niigata 951-8510, Japan d Department of Neurosurgery, The University of Tokyo Graduate School of Medicine, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan e Center for Transdisciplinary Research, Niigata University, 1-757 Asahi-machi St., Chuo-ku, Niigata 951-8510,Japan

ARTICLE INFO ABSTRACT

How visual object categories are represented in the brain is one of the key questions in neuroscience. Studies on low-level visual features have shown that relative timings or phases of neural activity between multiple brain locations encode information. However, whether such temporal patterns of neural activity are used in the representation of visual objects is unknown. Here, we examined whether and how visual object categories could be predicted (or decoded) from temporal patterns of electrocorticographic (ECoG) signals from the temporal cortex in five patients with epilepsy. We used temporal correlations between electrodes as input features, and compared the decoding performance with features defined by spectral power and phase from individual electrodes. While using power or phase alone, the decoding accuracy was significantly better than chance, correlations alone or those combined with power outperformed other features. Decoding performance with correlations was degraded by shuffling the order of trials of the same category in each electrode, indicating that the relative time series between electrodes in each trial is critical. Analysis using a sliding time window revealed that decoding performance with correlations began to rise earlier than that with power. This earlier increase in performance was replicated by a model using phase differences to encode categories. These results suggest that activity patterns arising from interactions between multiple neuronal units carry additional information on visual object categories.

© 2013 The Authors. Published by Elsevier Inc. All rights reserved.

CrossMark

Article history:

Accepted 11 December 2013 Available online 19 December 2013

Keywords: ECoG IT cortex Object category Decoding Temporal coding

Introduction

Response selectivity to visual object categories is a hallmark of the temporal visual cortex. Single neurons in the inferior temporal cortex (IT) respond to specific object categories such as faces, hands, and buildings (Desimone et al., 1984; Kreiman et al., 2006; Perrett et al., 1982; Tanaka, 1996; Tsao et al., 2006). While previous studies found that object category information can be coded in the activity of a single neuron or region, other studies have shown that object categories are represented by activity patterns of a population of neurons in the monkey inferotemporal (IT) cortex (Kiani et al., 2007) and fMRI voxels in the human IT cortex (Kriegeskorte et al., 2008).

While analyses of multiple neurons or voxels have revealed detailed object representations, these studies might have overlooked the

☆ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-No Derivative Works License, which permits noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Corresponding author at: ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288, Japan. Fax: +81 774 95 1259.

E-mail address: kmtn@atr.jp (Y. Kamitani).

possibility that correlations between neuronal units in single-trial signals contribute to object representation. Theoretical studies have suggested that neural representations can be achieved using temporal patterns over multiple neuronal units, including the order of response latencies (Gautrais and Thorpe, 1998; Thorpe et al., 2001; Van Rullen et al., 1998), spike sequences with millisecond precision (Abeles, 1991; Abeles et al., 1993; Lestienne and Strehler, 1987; Oram et al., 1999), and gamma-band synchronization (Eckhorn et al., 1988; Engel et al., 1991; Gray et al., 1989). Experimental results have also shown that such temporal patterns encode information on low-level visual features, such as light intensity in the retina (Gollisch and Meister, 2008), orientation in the primary visual cortex (Celebrini et al., 1993; Gawne et al., 1996; Shriki et al., 2012), and co-occurrence of edges (Eckhorn et al., 1988; Engel et al., 1991; Gray et al., 1989). Thus, it is of great interest whether temporal patterns are used to represent visual object categories.

In this study, we used electrocorticogram (ECoG) to record object-evoked neural responses in the temporal cortex from five patients with epilepsy. ECoG allowed for simultaneous, high-temporal resolution measurement of neural responses at multiple sites over a wide range of the temporal cortex, which is generally difficult to perform using fMRI or single unit recording. We calculated the temporal correlations between

1053-8119/$ - see front matter © 2013 The Authors. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.neuroimage.2013.12.020

ECoG electrodes using single-trial time series, and constructed a statistical classifier (decoder) to predict the category of a presented object from those signal features on a trial-by-trial basis (Kamitani and Tong, 2005). Temporal correlation measures the degree of in-phase synchronization between two time series (Varela et al., 2001). For comparison, we used the spectral power or phase values of ECoG signals in individual electrodes as features, and compared the decoding performances to examine whether temporal correlations carry additional information on object categories.

In addition, we investigated whether ECoG responses that produce informative correlations are time-locked to the stimulus or not. Informative correlations could be generated by category specific ECoG responses with constant latencies across trials, or alternatively by relative time series of ECoG signals with variable latencies across trials. To examine this, we conducted a shuffling analysis (Averbeck et al., 2006; Yamashita et al., 2008). In this analysis, we created data in which relative timing/phase differences between electrodes within each trial were destroyed (trial-shuffled data) from the original ECoG data by randomly permuting trials with the same stimulus in each electrode. We compared the decoding performance using the shuffled data to the decoding performance of the original data.

To characterize the time course of decoding performance, we calculated decoding performance as a function of time (see Liu et al., 2009) and compared this between spectral power values and temporal correlations. Finally, to specify which encoding model explained the results of the time course analysis at the neuronal electric source level, we performed a simulation analysis in which two representative models were tested; a model where sources encode information with their response latencies (Celebrini et al., 1993; Gautrais and Thorpe, 1998; Gollisch and Meister, 2008; Shriki et al., 2012; Van Rullen et al., 1998), and a model where sources encode using their phase differences (Eckhorn et al., 1988; Engel et al., 1991; Gray et al., 1989). The results from these analyses suggest that activity patterns arising from interactions between multiple neuronal units carry a significant amount of information on visual object categories.

Materials and methods

Subjects

Five patients with medically intractable epilepsy (5 female, 22-42 years, Table 1) participated in our experiments. All patients were admitted to The University of Tokyo Hospital, Japan. Patients underwent electrode implantation for the purpose of localizing seizure foci to guide neurosurgical treatment (Fig. 1). The locations of electrodes were determined solely by therapeutic considerations. We obtained written informed consent from the patients, and all experimental protocols were approved by the institutional review board at the hospital (#1797(3)).

ECoG recordings

Patients were implanted with subdural electrode arrays arranged in grids or strips (Unique Medical Co., Ltd., Tokyo, Japan). Each grid/strip contained 4-20 electrodes. Each electrode contact was 3 mm in diameter

with 10 mm separation, or 1.5 mm in diameter with 5 mm separation. The number and location of the recording sites in the temporal, occipital, and frontal lobes were determined exclusively by clinical criteria. Data were obtained from a total of 628 recording sites, with 120-127 electrodes per subject. Recorded signals were amplified using a reference electrode placed on the scalp, filtered between 0.55 and 150 Hz and sampled at 400 Hz (Nicoletone, CareFusion, San Diego, CA, USA). In addition to this bandpass filtering, the signals were filtered to pass each of five frequency bands (1-10, 10-20, 20-30, 30-80 and 80-150 Hz) when calculating ECoG features in bands of interest. All data were acquired during periods without seizure events.

Electrode localization

To localize electrodes, we integrated the anatomical information of the brain provided by preoperative magnetic resonance imaging (MRI), and spatial information of the electrode positions provided by postoperative computer tomography (CT). For each subject, the 3D brain surface was reconstructed and an automatic registration based on mutual information was performed using Avizo (Maxnet Co., Ltd., Tokyo, Japan). Because the location of the recording site depended on clinical criteria, various ventral and lateral cortical areas were evaluated for each patient, and the coordinate of each electrode contact in their stereotactic scheme was measured (using the Talairach coordinate system). Coordinates were used to anatomically localize contacts using the proportional atlas of Talairach and Tournoux (1993), after a linear scale adjustment to correct size differences between the patient's brain and the Talairach model using Statistical Parametric Mapping, Version 8 software (SPM8, Wellcome Trust, London, UK) (http://www.fil.ion.ucl. ac.uk/spm/). Electrodes on the IT cortex that consists of the middle and inferior temporal gyri were selected and used for the analysis of this study (55 electrodes for S1, 84 electrodes for S2,117 electrodes for S3, 104 electrodes for S4,106 electrodes for S5; Fig. 1, Table 1).

Stimulus presentation

We prepared 120 colored photographs of objects from 24 different categories as stimuli (Fig. 2A). There were five different exemplars per category. The stimuli were presented on a 27-inch LCD monitor at a viewing distance of 57 cm with a central fixation point (0.5°) (Fig. 2B). Each stimulus subtended a 6° x 6° visual angle and was presented for 300 ms followed by a 900-ms interval period (Fig. 2C). The stimuli were presented in pseudorandom order and each stimulus was presented either 10 or 11 times per subject. We instructed the subjects to fix their eyes on the fixation point and to perform a one-back task, indicating whether an exemplar was repeated successively or not by pressing a button (55-120 repetition trials per subject). We excluded trials with button presses from any analyses reported in this study.

ECoG features

Signal features used as input to decoding analyses were calculated from single-trial time series of the individual electrodes in the temporal

Table 1

Patient demographics, task performance, and the number of electrodes.

Subject Age Sex Task performance (success/total) # of all implanted electrodes # of electrodes on IT cortex (left side/left bottom/right side/right bottom)

S1 42 F No record 120 55 (15/40/0/0)

S2 36 F 93/112 127 84 (10/32/10/32)

S3 22 F 109/113 127 117 (5/52/5/55)

S4 35 F 111/120 127 104 (0/52/0/52)

S5 38 F 53/55 127 106 (2/52/0/52)

Fig. 1. ECoG electrodes. Structural MRI images of each patient are shown with the positions of ECoG electrodes (black and white circles). Large and small circles show electrodes with contact sizes of 3.0 mm and 1.5 mm, respectively. A total of 120 electrodes for S1 and 127 electrodes for the other patients were implanted for medical diagnosis. Electrodes on the IT cortex that consists of the middle and inferior temporal gyri were selected and used for analysis (white circles).

cortex within a time window from 0 to 300 ms relative to the stimulus onset or a 100/300-ms sliding time window with a step size of 25 ms.

"Power features" were the power values of five frequency bands (1-10,10-20, 20-30, 30-80 and 80-150 Hz) calculated from the power spectrum for each time window (the number of electrodes times the five frequency bands). "Correlation features" were the correlation coefficients of time series for all pairs of electrodes within each time window (1485-6786 features for each subject). In the analysis to compare specific frequency bands (the five bins of power spectral frequency), power features were limited to those in the frequency bands of interest. Correlation features were re-calculated from the band-pass filtered time series.

Correlation features are higher-order variables based on the products of (normalized) signal amplitudes between electrodes at each time point. As a result, the number of available features is much larger than that of power features. To control for the effect of higherorder feature extraction and the number of available features, we used the products of power values as an additional feature type ("product-of-power features"). In each frequency band, the products of the power values were calculated for all pairs of electrodes (five [frequency bands] times the number of correlation features).

We also used the phase values of ECoG time series as features (Hammer et al., 2013; Lopour et al., 2013). We calculated the phase values of five frequencies (6,16,26, 56 and 116 Hz; near-middle values of the five frequency bands) in each time window using the fast Fourier transform (FFT), and took the sine and cosine values. These values were used as "phase features" (the number of electrodes x 5 [frequencies] x 2 [trigonometric functions]). Note that individual phase features do not explicitly indicate relative timings between electrodes, although the linear decoder may detect relative timings via the weighted summation of phase features.

In the shuffling analysis to characterize informative correlations, correlation features were calculated with ECoG data in which relative timing/phase differences between electrodes within each trial were destroyed by a trial-shuffling procedure (Averbeck et al., 2006; Yamashita et al., 2008). In each electrode, we randomly permuted the data across trials for the same visual stimulus, while preserving the ECoG time series within each electrode and trial. In these shuffled data, if ECoG responses to each category were constant across trials (time-locked to the stimulus onset), the original correlations would be preserved (Fig. 3A). However, if the original correlations were from the relative time series in each trial (not time-locked to the stimulus onset) and the variability of response latencies across trials was sufficiently large, the correlations would be removed by the shuffling (Fig. 3B).

Feature vectors were created from individual trials for a given time window in each subject. A feature vector consisted of the features described above or the concatenated features of powers and correlations, each calculated from a time window of each trial. Before training, we conducted a feature normalization procedure and a feature selection procedure (see the next section).

Decoding procedure

We constructed a linear classifier (decoder) to predict the categories of presented stimuli from ECoG feature vectors on a trial-by-trial basis. Feature vectors labeled by the stimulus categories of individual trials in each subject were divided into training and test datasets, and a linear support vector machine (SVM; Vapnik, 1998) algorithm determined the parameters of the decoder using the training dataset. The decoder calculated the linearly weighted sum of the features plus a bias for each category (class) given a feature vector from the test dataset. The

■ nuns

Animal (small) Fish Vegetable Scene Pattern Vehicle Clothing Jewelry

a S iû H I

* ■ s □ □

Q a □ S E

□ * \ a «

£ H PS a

Il!| Iii S m □

m <9 S B □ 9 □

T. r m a ¡¿ja?! i ;.v H

Fixation point 0.5 deg

Stimulus size 6 deg x 6 deg

Face part Animal (large) Insect Flower Fruit Forest Money Building Commodity Letter string Drawing Beverage

m ■ m ©

KRQ DOG

líí+J'J m

Interval 900 ms

Stimulus presentation 300 ms

Fig. 2. Visual stimuli and experimental design. A, Visual stimuli. We used 120 colored photographs of objects from 24 different categories as visual stimuli. There were five different exemplars per category. B, Stimulus presentation. A visual stimulus (6° x 6°) was shown with a gray background and a central fixation point (0.5°). C, Time course of presentation. Visual stimuli were sequentially presented to subjects. Each presentation was 300 ms long followed by a 900-ms interval.

category with the maximum value was chosen as the predicted category (Kamitani and Tong, 2005). The values of each feature were normalized using the sample mean and standard deviation calculated with the training dataset. The dimensionality of the feature vector was reduced by selecting informative features based on a univariate analysis (F-statistics) applied to the training dataset. We ranked the features by the F-value that indicated differential responses to the categories, and the top N features were used as inputs to the decoder. The number of used features (N) was decided by a cross-validation analysis within the training dataset, where N was varied from 50 to 1000 in increments of 50, and the N with the highest accuracy was chosen (nested cross-validation). The average numbers of selected features for the analysis with the 0-300-ms time window were 70 ± 63 (mean ± SD across subjects) for power features, 694 ± 404 for correlation features, and 778 ± 330 for the combined features. To see the dependence on the number of features, we also performed cross-validation analysis with a fixed N (without the nested cross-validation to optimize N), and decoding performance was calculated for each feature type while N was varied (from 10 to 1000 increased by 10).

To evaluate generalization performance for category classification across different exemplars, we ensured that trials corresponding to the same visual stimuli were not included in both the training and test datasets (Vindiola and Wolmetz, 2011). We divided the 120 stimuli into five groups, each of which contained 24 stimuli from the 24 different categories and divided the corresponding trials into five groups. Four groups were then used to train a decoder and the remaining group was used for evaluating the trained classifier. This procedure was repeated until the trials from all five groups were tested (5-fold cross-validation), and the percentage of correct classification was calculated. The cross-validation for determining the number of features was performed using the four groups in the training dataset (4-fold cross-validation).

To compare decoding accuracy between conditions, we used a chi-square test for within-subject analysis, and a paired t-test for group analysis. For the group analysis, we calculated the logit-transformed accuracy of each subject, log (a / (1 — a)) where a is an accuracy, and then applied a paired t-test to those values. We used logit-transformed accuracies rather than the original ones because a normal distribution has infinite support while accuracies are bounded from zero to one, and the assumption that obtained accuracies follow a normal distribution is

ECoG response time-locked to stimulus onset

Original data Shuffled data

Ch 1 Trial 1 Ch 2 Ch 3

Ch 1 Trial 2 Ch 2 Ch 3

ECoG response not time-locked to stimulus onset

Original data

Trial 1

Trial 2

Ch 1 Ch 2 Ch 3

Shuffled data

Fig. 3. Effect of trial shuffling on ECoG responses time-locked and unlocked to the stimulus onset. A, Effect on time-locked responses. Multi-electrode ECoG responses specific to a stimulus arise with constant latencies across trials. In this case, the original response pattern remains after permuting trials in each electrode. B, Effect on unlocked responses. Multi-electrode ECoG responses specific to a stimulus arise with different latencies across trials while preserving relative timings between electrodes. In this case, the original response pattern is destroyed after permuting trials in each electrode.

not suitable (Lesaffre et al., 2007). We could have used a hierarchical model to account for a binomial distribution of the accuracy in individual subjects, but since we had many trials in individual subjects (1200 trials per subject) and the standard error of the mean accuracy was very small (1.1 ± 0.1%), the measured accuracy was assumed to be the mean accuracy in each subject.

Simulation analysis

To perform a simulation analysis, we assumed 200 neural signal sources arranged in a one-dimensional array with 1-mm intervals. Ten ECoG electrodes were placed 1 mm above those sources with 10 mm intervals. The signals of the ECoG electrodes were expressed by the spatial sums of the sources with a lead field matrix (Nunez, 1981). In the latency coding model, each source had a fixed waveform produced by sampling from an i.i.d. Gaussian distribution (N(0,32)). Given a stimulus, each source produced a waveform with a latency specific to the stimulus category. Those latency values were uniformly distributed from 130 to 180 ms over stimulus categories. In each trial, the latencies of all sources were jittered by the same amount randomly chosen from 0 to 50 ms (while preserving relative timings across trials), and independent Gaussian white noise (N(0, 32)) was added. In the phase reset model, the sources were oscillators at a frequency of 40 Hz with an amplitude of one. In each trial, the initial phase values of all sources were randomly chosen. The phase values were then reset to values specific to the stimulus category at the reset time, which was randomly chosen for each trial between 130 and 180 ms. Independent Gaussian white noise (N(0, 32)) was added to all sources in each trial. The number of categories was set to 24 and each category had 50 trials (24 x 50 = 1200 trials in total), to match the empirical data. Electrode signals were filtered between 0.55 and 150 Hz, which was similar to the recording condition in the experiments. Using simulated data produced by each model, we constructed decoders using either spectral power or correlation features, and calculated the decoding performance as a function of time. In addition, we also tested modified versions of the two models to take into account category-specific amplitude modulation

of sources. It is commonly assumed that the amplitudes of neuronal sources change depending on the stimulus (Naruse et al., 2010; Palva and Palva, 2007). In the latency coding model, each source signal before noise addition was multiplied by a value specific to the given stimulus category, which was randomly chosen from one to (M + 1) in each category, for each electrode. In the phase reset model, the amplitude of each source was changed from one to a value specific to the given stimulus category at 130 ms. The amplitudes after 130 ms were multiplied by category-specific values randomly chosen from one to (M + 1). The parameter M was manipulated to control the degree of dominance of amplitude coding in the model. When M = 0, both models had no amplitude modulation. Power and correlation features were calculated from a 100-ms sliding time window. The correlations for all electrode pairs and the power values for all electrodes and the five frequency bands were used as input features with no feature selection.

Results

We measured ECoG data from implanted electrodes located predominantly in the temporal cortex; data from five subjects were recorded while they sequentially viewed natural images (Fig. 2C). We confirmed that all data were acquired during periods without seizure events. Electrodes on the IT cortex that consists of the middle and inferior temporal gyri were selected and used for the analysis of this study (Fig. 1, Table 1).

To illustrate the category selectivity of spectral power and temporal correlations, we calculated the trial-averaged time courses of power values in the 1 -10 Hz spectral band and that of temporal correlations responding to each stimulus category. We plotted the time courses of the power values at two representative electrodes for the face category and the letterstring category, and those of the temporal correlation between the electrodes (Fig. 4). In this example, the temporal correlation showed category-selective time courses whereas the modulation of power at the two electrodes is not distinctive.

To evaluate information encoded in each type of feature, we performed decoding analysis using the power, correlation and combined features calculated from the ECoG time series from 0 to 300 ms relative to the stimulus onset (24 categories; chance level, 4.16%). Fig. 5A shows the cross-validated performances for the three feature types in each subject, using the number of features optimized by a nested cross-validation (see Materials and methods). Performance exceeded the chance level for all feature types and all subjects (P < 10-5, binomial test). By combining power and correlation features, the decoding performance was substantially improved compared with the performance using power alone (P < 0.05, paired t-test across subjects; P < 0.05 in all individual subjects, chi-square test). Even correlation features alone significantly outperformed power features (P < 0.05, paired t-test across subjects; P < 0.05 in four out of five subjects, chi-square test).

Fig. 5B shows the mean performances as a function ofthe number of used features. In addition to power, correlation, and combined features, we plotted the results for the products of power values (product-of-power features), which were introduced to control for the effect of higher-order feature extraction and the number of available features (see Materials and methods). The performances with power and product-of-power features peaked around 50 features, while the performances with correlation and combined features continued to improve, even with > 500 features. When compared with the same number of features used, correlation and combined features outperformed power features even at small numbers of features (around 50-250 features). Furthermore, the decoding performance with product-of-power features remained lower than those with the other features, even though the number of available features was the largest. Thus, the performance improvement by adding correlation features (Fig. 5A) cannot be attributed to the larger number of used or available features. Temporal correlations appear to carry additional information about object categories

Power (1-10 Hz)

250 Time (ms)

Correlation

250 Time (ms)

Letterstring

Fig. 4. Category selectivity in power and correlation with representative electrodes. The trial-averaged time courses of the power values in the 1 -10 Hz band at two electrodes (middle) and of the correlations (right) are shown for the face and letterstring categories.

that is not represented by power or higher-order features generated from power.

Correlation features reflect phase or timing differences of ECoG time series between electrodes. To see whether phase information in individual electrodes is sufficient to achieve the high decoding performance obtained with correlations, we tested decoding accuracies using phase features (the number of features optimized by a nested cross-validation; see Materials and methods). Fig. 6 shows the mean

Correlation

Power & Correlation

-----§-----Chance

S1 S2 S3 S4 S5 Subject

Í3 a>

■ — — - Product of power Correlation Power & Correlation

- Chance

500 # of features

Fig. 5. Decoding performance obtained with power, correlation, and both. A, Results from individual subjects and the means (dashed line: chance level; *: P < 0.05; chi-square test on individual subjects, and paired t-test on mean performance; error bar: s.e.m. over subjects). The number of features used was determined by a nested cross-validation procedure. B, Results with different numbers of features. In addition to the above three types, the performance of product-of-power features is plotted (means across subjects). Note that the number of available features was fewer for power (<275) than for the other feature types.

performances obtained with power, phase, and correlation features. The decoding performance obtained with correlation features was better than the others (P < 0.05, paired t-test). Although phase features could implicitly encode phase or timing differences between electrodes, the results suggest that a more explicit representation of the differences by correlations was critical for achieving the high decoding performance.

Informative correlations could be generated by category specific ECoG responses with constant latencies across trials (stimulus-locked responses), or alternatively by relative time series of ECoG signals with variable latencies across trials. To examine which type ofresponse produced informative correlations, we created trial-shuffled data (see Materials and methods) and compared the decoding performance between the original and shuffled datasets. Fig. 7 shows the decoding performance using the original and shuffled data for each frequency band. Using the data before the band-pass filtering ("All"), the shuffling degraded the performance in most subjects (P < 0.05 in 4/5 subjects, chi-square test). Among the five frequency bands, the 30-80 Hz and the 80-150 Hz bands showed performance degradation (P < 0.05 in 5/5 and 2/5subjects, respectively). At the group level, the decoding performance in the 30-80 Hz was significantly degraded by shuffling (P < 0.05, paired t-test, corrected for multiple comparison). While the overall performance was highest in the 1 -10 Hz band, the performance was comparable between the original and shuffled data. The 10-20 Hz and 20-30 Hz bands showed poor performance in both of the original and shuffled data. These results suggest that stimulus-unlocked, relative time series may contribute to category coding in the high frequency bands. Informative correlations in the low frequency band may arise in a stimulus-locked manner, but it is also possible that the variability

- Chance

Phase Feature

Correlation

Fig. 6. Decoding performance obtained with power, phase and correlation. The values for power and correlation features are the same as those in Fig. 5A (dashed line: chance level; *: P < 0.05, paired t-test, error bar: s.e.m. over subjects).

Original

Frequency (Hz)

Fig. 7. Shuffling analysis. Decoding performance obtained with correlation features is compared between the original and shuffled data. The analysis was applied to ECoG signals before band-pass filtering (all) and band-pass filtered signals. The bar graph shows the mean performance over subjects (dashed line: chance level). Symbols represent individual subjects.

of response latencies across trials was too small compared to the low frequency cycle to destroy correlations from relative time series.

To characterize the time course of decoding performance, we used a sliding time window and calculated the decoding performance as a function of time. The time window was shifted by 25 ms, and decoding performance was plotted for the power and correlation features (Fig. 8, left). In this analysis, powervalues from five frequency bands and correlations for the raw ECoG signals were used as input features, respectively. The decoding performance with the 300-ms time window began to rise when the center of the time window was earlier than 100 ms after the stimulus onset and remained above chance level even after the disappearance of the stimulus (300 ms). To evaluate when decoding performance began to rise from chance level, we defined the onset time as the first point at which the mean decoding performance over subjects exceeded the 99% percentile (% correct = 5.4%) in the performance distribution at 0 ms. The decoding performance with correlations rose earlier than that with power (25 ms and 100 ms for correlation and power features, respectively). To see the dependence on the width of the used time window, we calculated decoding accuracies

Time (ms)

Fig. 8. Time course of decoding performance. Decoding performances obtained with power (red) and correlation (blue) are plotted as a function of time for 300-ms and 100-ms time windows (dashed line: chance level; shaded region: s.e.m. over subjects). Each arrow indicates the center of the time window at which the decoding performance exceeded the threshold.

with a 100-ms sliding time window (Fig. 8, right). Even with the shorter time window, the onset time for correlations was earlier than that for power. The peak performance for correlations was slightly lower than that for power with this short time window, presumably because correlations could be better estimated using a longer time series. The results indicate the capacity of temporal correlations between electrodes to encode category information in early responses.

To examine what mechanism could explain the early onset of the good decoding performance with correlations, we tested two representative models: a latency coding model and a phase reset model, where category-specific temporal patterns were produced by latency differences and by phase differences across sources, respectively (Figs. 9A, B; see Materials and methods; Eckhorn et al., 1988; Engel et al., 1991; Gautrais and Thorpe, 1998; Gray et al., 1989; Thorpe et al., 2001; Van Rullen et al., 1998).

In the latency coding model, decoding performance began to rise around the same time for both power and correlation features in all conditions (Fig. 9C), failing to account for the dissociation between power and correlation features. This is presumably because changes in power and correlation are tightly coupled via response latencies. In contrast, for the phase reset model, the performance with correlations began to rise earlier compared with power, except when the amplitude modulation was strong (M = 10; Fig. 9D, bottom). Phase resetting may cause only subtle differences in power via the summation of synchronized/ desynchronized neuronal sources, resulting in a slower onset of decoding performance with power features.

Discussion

In the present study, we have shown that temporal correlations of ECoG signals between electrodes provide additional information on visual object categories compared to information represented by spectral power or phase in individual electrodes (Figs. 5 and 6). The trial-shuffling procedure degraded decoding performance obtained with correlations for raw and gamma band ECoG signals (Fig. 7), indicating that relative time series between electrodes contain information about categories. Time course analysis using a sliding time window revealed that decoding performance obtained with correlations began to rise earlier compared with power (Fig. 8). In the simulation analysis, this difference between power and correlation features was reproduced in a model where neuronal electric sources encode information using their phase differences (Fig. 9). These results suggest that not only response amplitudes but also temporal correlations over multiple brain areas carry information on visual object categories, and that informative correlations can be derived from category-specific, relative time series of neural activity. The simulation results suggest that temporal correlations may reflect phase differences of neuronal electric sources in the temporal cortex at least in the early response period.

Although we focused on correlations of ECoG time series between electrodes without a time lag, the use of different features that take into account interactions between two different points in space and time could improve decoding performance. In an EEG study, it was reported that estimated off-diagonal coefficients in a multivariate autoregressive model are more informative than diagonal ones in classifying sleep stages (Penny and Roberts, 2002). It remains to be seen whether such sophisticated features are efficient for the decoding of neural object representations.

Several previous studies have proposed the possibility that visual information is encoded in temporal relations of spikes or field potentials between multiple brain locations, and more specifically, in the order of spike latencies (Gautrais and Thorpe, 1998; Thorpe et al., 2001; Van Rullen et al., 1998), spike sequences with millisecond precision (Abeles, 1991; Abeles et al., 1993; Lestienne and Strehler, 1987; Oram et al., 1999), and gamma band synchronization (Eckhorn et al., 1988; Engel et al., 1991; Gray et al., 1989). Some experimental results have shown that such patterns encode low-level visual information, such as

Latency coding model Category 1

—^hfoVvy^VvK

—fvWVV^V -

(3 Category 2 co i- y y

-iäwMM

-iYMV"VvVf1 -vywrtV

100 150 200

50 100 150 200 Time (ms)

Category 1

Phase reset model Reset

50 100 150 200 Reset

50 100 150 200 Time (ms)

Latency coding model

Phase reset model

50 100 150

Time (ms)

50 100 150

Time (ms)

(amplitude modulation) = 0

M = 10

Fig. 9. Simulation analysis. A, Neuronal responses in the latency coding model (before adding noise). In each trial, a source produces its own waveform with a latency specific to the given stimulus. B, Neuronal responses in the phase reset model (before adding noise). Sources behave as oscillators at a fixed frequency. In each trial, the phase of each source is reset to a value specific to the given stimulus at a time after the stimulus onset. C, Time course of decoding performance in the latency coding model. The time courses of decoding performance with no amplitude modulation (M = 0, top), with moderate amplitude modulation (M = 1, middle), and with strong amplitude modulation (M = 10, bottom) are shown. D, Time course of decoding performance in the phase reset model. The time courses of decoding performance with the three levels of amplitude modulation are shown.

light intensity in the retina (Gollisch and Meister, 2008), orientation in the primary visual cortex (Celebrini et al., 1993; Gawne et al., 1996; Shriki et al., 2012), and co-occurrence of edges (Eckhorn et al., 1988; Engel et al., 1991; Gray et al., 1989). Although our experimental data are not at the level of single neurons, ECoG signals have been assumed to provide an aggregate signature of transmembrane currents within a cortical area of several millimeters (Buzsaki et al., 2012; Reimann et al., 2013; Varela et al., 2001). Thus, informative correlations across electrodes in the IT cortex shown in our present study may derive from such temporal patterns encoding visual object categories.

Our shuffling procedure degraded decoding performance obtained with correlation features for the raw ECoG signals and the signals from the two highest frequency bands (Fig. 7). These results suggest that informative correlations partly arise from relative time series that are specific to each category and not time-locked to the stimulus onset across trials. On the other hand, the shuffling procedure did not degrade decoding performance in the three lower frequency bands. This may imply that informative correlations in those bands arise in a

stimulus-locked manner. However, it is also possible that even if relative time series contributed to information coding in the lower frequency bands, the fluctuations of response delays across trials were too small relative to the low frequency cycles to have a substantial effect from shuffling.

We found that decoding performance with temporal correlations increased earlier than that with power values, and that the difference in the timing of the performance increase was reproduced by our phase reset model, where object categories were coded by phase differences rather than latency differences across neuronal sources. While latency differences can cause robust patterns in both power and correlation features, phase resetting may produce distinct patterns in correlation features but not in power features. Even in the phase reset model, spectral power could be modulated via the synchronization or desynchronization of neuronal sources near each electrode. However, the detection of such small changes in power in the presence of noise may require a broader temporal summation, resulting in a delayed increase in decoding performance. The results suggest that at least in the early response period,

informative correlations originate from subtle temporal patterns like phase differences.

In our experiment, we did not rigorously match lower-level visual features across stimulus exemplars. The spatial frequency of an image has been known to be an important feature used by the visual system to assign category membership (Crouzet and Thorpe, 2011). Single neurons in the IT cortex have been known to show selectivity to specific complex shapes, which can be considered as components of objects (Tanaka, 1996). Our results are based only on population-level selectivity to object categories, and further analysis will be necessary to understand the exact nature of neural representation in IT that underlies object decoding.

A critical advantage of the use of ECoG is that it allows recording of electric signals simultaneously from multiple sites in the temporal cortex at high temporal resolution. In our shuffling analysis, we examined whether informative correlations derived from category-specific relative time series in each trial. This kind of analysis requires simultaneous recording of neural signals from multiple brain locations with high temporal resolution. In some studies using single-unit recordings, the signal from each electrode was recorded separately because of the difficulty of simultaneous recording from multiple sites (Georgopoulos and Massey, 1988; Gochin et al., 1994; Hung et al., 2005; Kiani et al., 2007; McAdams and Maunsell, 1999; Rolls et al., 1997). However, with this method, stimulus-unlocked patterns over multiple brain locations may be overlooked (Averbeck et al., 2006). Recording with a microelectrode array could be a promising method for such analysis, but the coverage of current state-of-the-art in vivo arrays is limited to an area of several millimeters and is not sufficient to cover the whole temporal cortex (Fejtl et al., 2006). Although other neuroimaging techniques such as fMRI, scalp EEG, and MEG provide simultaneous recordings from wide brain regions, the temporal resolution of fMRI is limited to the timescale of seconds, and the spatial resolution of scalp EEG and MEG is insufficient to reveal synchronization within two or less centimeters (Varela et al., 2001). ECoG provides better spatiotemporal resolution than EEG, MEG, and fMRI, making ECoG a useful tool to examine theoretical hypotheses about fine temporal coding. The combination of ECoG recordings and neural decoding techniques has become a powerful approach to the investigation of neural representations in the last decade (Chao et al., 2010; Hammer et al., 2013; Liu et al., 2009; Pasley et al., 2012; Tsuchiya et al., 2008; Yanagisawa et al., 2009). In addition, new, high-density ECoG electrode arrays are being developed (Hollenberg et al., 2006; Matsuo et al., 2011; Rubehn et al., 2009; Toda et al., 2011; Watanabe et al., 2012; Yeager et al., 2008). The use of these electrode arrays in monkeys and other animals may reveal even more detailed neural representations.

Acknowledgments

We thank Tomoyasu Horikawa, Paul Sukhanov, Atsushi Matsui, Nancy Wang, Makoto Takemiya, Satoshi Hirose, Makoto Fukushima, Kazushi Ikeda, and Yoichi Miyawaki for helpful discussion and comments. This work was supported by JSPS KAKENHI (11J08024 and 21390405), SRPBS from MEXT, Grant for Promotion of Niigata University Research Project, Grant for Comprehensive Research on Disability, Health and Welfare (H23-Nervous and Muscular-General-003) from MHLW, and 2008 Specified Research grant from Takeda Science Foundation.

Conflict of interest

The authors declare no competing financial interests.

References

Abeles, M., 1991. Corticonics: Neural circuits of the cerebral cortex. Cambridge UP, New York.

Abeles, M., Bergman, H., Margalit, E., Vaadia, E., 1993. Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. J. Neurophysiol. 70,1629-1638.

Averbeck B.B., Latham, P.E., Pouget, A., 2006. Neural correlations, population coding and computation. Nat. Rev. Neurosci. 7,358-366.

Buzsäki, G., Anastassiou, C.A., Koch, C., 2012. The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes. Nat. Rev. Neurosci. 13,407-420.

Celebrini, S., Thorpe, S., Trotter, Y., Imbert, M., 1993. Dynamics of orientation coding in area V1 of the awake primate. Vis. Neurosci. 10,811-825.

Chao, Z.C., Nagasaka, Y., Fujii, N., 2010. Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys. Front. Neuroeng. 30, 3.

Crouzet, S.M., Thorpe, S.J., 2011. Low-level cues and ultra-fast face detection. Front. Psychol. 2, 342.

Desimone, R., Albright, T.D., Gross, C.G., Bruce, C., 1984. Stimulus-selective properties of inferior temporal neurons in the macaque. J. Neurosci. 4,2051-2062.

Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk M., Reitboeck H.J., 1988. Coherent oscillations: a mechanism of feature linking in the visual cortex? Multiple electrode and correlation analyses in the cat. Biol. Cybern. 60,121-130.

Engel, A.K., König, P., Kreiter, A.K., Singer, W., 1991. Interhemispheric synchronization of oscillatory neuronal responses in cat visual cortex. Science 252,1177-1179.

Fejtl, M., Stett, A., Nisch, W., Boven, K.H., Möller, A., 2006. On micro-electrode array revival: its development, sophistication of recording, and stimulation. In: Taketani, M., Baudry, M. (Eds.), Advances in Network Electrophysiology: Using Multi-electrode Arrays. Springer, New York pp. 24-37.

Gautrais, J., Thorpe, S., 1998. Rate coding versus temporal order coding: a theoretical approach. Biosystems 48, 57-65.

Gawne, T.J., Kjaer, T.W., Richmond, B.J., 1996. Latency: another potential code for feature binding in striate cortex. J. Neurophysiol. 76, 1356-1360.

Georgopoulos, A.P., Massey, J.T., 1988. Cognitive spatial-motor processes. 2. Information transmitted by the direction of two-dimensional arm movements and by neuronal populations in primate motor cortex and area 5. Exp. Brain Res. 69, 315-326.

Gochin, P.M., Colombo, M., Dorfman, G.A., Gerstein, G.L., Gross, C.G., 1994. Neural ensemble coding in inferior temporal cortex. J. Neurophisiol. 71,2325-2337.

Gollisch, T., Meister, M., 2008. Rapid neural coding in the retina with relative spike latencies. Science 319,1108-1111.

Gray, C.M., König, P., Engel, A.K., Singer, W., 1989. Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338, 334-337.

Hammer, J., Fischer, J., Ruescher, J., Schulze-Bonhage, A., Aertsen, A., Ball, T., 2013. The role of ECoG magnitude and phase in decoding position, velocity, and acceleration during continuous motor behavior. Front. Neurosci. 7, 200.

Hollenberg, B.A., Richards, C.D., Richards, R., Bahr, D.F., Rector, D.M., 2006. A MEMS fabricated flexible electrode array for recording surface field potentials. J. Neurosci. Methods 153,147-153.

Hung, C.P., Kreiman, G., Poggio, T., DiCarlo, J.J., 2005. Fast readout of object identity from macaque inferior temporal cortex. Science 310, 863-866.

Kamitani, Y., Tong, F., 2005. Decoding the visual and subjective contents of the human brain. Nat. Neurosci. 8, 679-685.

Kiani, R., Esteky, H., Mirpour, K., Tanaka, K., 2007. Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. J. Neurophysiol. 97, 4296-4309.

Kreiman, G., Hung, C.P., Kraskov, A., Quiroga, R.Q., Poggio, T., DiCarlo, J.J., 2006. Object selectivity of local field potentials and spikes in the macaque inferior temporal cortex. Neuron 49, 433-445.

Kriegeskorte, N., Mur, M., Ruff, D.A., Kiani, R., Bodurka, J., Esteky, H., Tanaka, K., Bandettini, P.A., 2008. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60,1126-1141.

Lesaffre, E., Rizopoulos, D., Tsonaka, R., 2007. The logistic transform for bounded outcome scores. Biostatistics 8, 72-85.

Lestienne, R., Strehler, B.L., 1987. Time structure and stimulus dependence of precisely replicating patterns present in monkey cortical neuronal spike trains. Brain Res. 437, 214-238.

Liu, H., Agam, Y., Madsen, J.R., Kreiman, G., 2009. Timing, timing, timing: fast decoding of object information from intracranial field potentials in human visual cortex. Neuron 62, 281-290.

Lopour, B.A., Tavassoli, A., Fried, I., Ringach, D.L., 2013. Coding of information in the phase of local field potentials within human medial temporal lobe. Neuron 79, 594-606.

Matsuo, T., Kawasaki, K., Osada, T., Sawahata, H., Suzuki, T., Shibata, M., Miyakawa, N., Nakahara, K., Iijima, A., Sato, N., Kawai, K., Saito, N., Hasegawa, I., 2011. Intrasulcal electrocorticography in macaque monkeys with minimally invasive neurosurgical protocols. Front. Syst. Neurosci. 5, 34.

McAdams, C.J., Maunsell, J.H., 1999. Effects of attention on the reliability of individual neurons in monkey visual cortex. Neuron 23, 765-773.

Naruse, Y., Matani, A., Miyawaki, Y., Okada, M., 2010. Influence of coherence between multiple cortical columns on alpha rhythm: a computational modeling study. Hum. Brain Mapp. 31, 703-715.

Nunez, P.L., 1981. Electric fields of the brain: the neurophysics of EEG. Oxford UP, New York

Oram, M.W., Wiener, M.C., Lestienne, R., Richmond, B.J., 1999. Stochastic nature of precisely timed spike patterns in visual system neuronal responses. J. Neurophysiol. 81,3021-3033.

Palva, S., Palva, J.M., 2007. New vistas for alpha-frequency band oscillation. Trends Neurosci. 30,150-158.

Pasley, B.N., David, S.V., Mesgarani, N., Flinker, A., Shamma, S.A., Crone, N.E., Knight, R.T., Chang, E.F., 2012. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251.

Penny, W.D., Roberts, S.J., 2002. Bayesian multivariate autoregressive models with structured priors. IEE Proc. Vis. Image Signal Process. 149,33-41.

Perrett, D.I., Rolls, E.T., Caan, W., 1982. Visual neurones responsive to faces in the monkey temporal cortex. Exp. Brain Res. 47,329-342.

Reimann, M.W., Anastassiou, C.A., Perin, R., Hill, S.L., Markram, H., Koch, C., 2013. A biophysically detailed model of neocortical local field potentials predicts the critical role of active membrane currents. Neuron 79,375-390.

Rolls, E.T., Treves, A., Tovee, M.J., 1997. The representational capacity of the distributed encoding of information provided by populations of neurons in primate temporal visual cortex. Exp. Brain Res. 114,149-162.

Rubehn, B., Bosman, C., Oostenveld, R., Fries, P., Stieglitz, T., 2009. A MEMS-based flexible multichannel ECoG-electrode array. J. Neural Eng. 6, 036003.

Shriki, O., Kohn, A., Shamir, M., 2012. Fast coding of orientation in primary visual cortex. PLoS Comput. Biol. 8, e1002536.

Talairach, J., Tournoux, P., 1993. Referentially oriented cerebral MRI anatomy: an atlas of stereotaxic anatomical correlations for gray and white matter. Thieme Medical Publishers, New York.

Tanaka, K, 1996. Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19,109-139.

Thorpe, S., Delorme, A., Van Rullen, R., 2001. Spike-based strategies for rapid processing. Neural Netw. 14, 715-725.

Toda, H. , Suzuki, T., Sawahata, H., Majima, K , Kamitani, Y. , Hasegawa, I., 2011. Simultaneous recording of ECoG and intracortical neuronal activity using a flexible multichannel electrode-mesh in visual cortex. NeuroImage 54,203-212.

Tsao, D.Y., Freiwald, W.A., Tootell, R.B., Livingstone, M.S., 2006. A cortical region consisting entirely of face-selective cells. Science 311, 670-674.

Tsuchiya, N., Kawasaki, H., Oya, H., Howard III, M.A., Adolphs, R., 2008. Decoding face information in time, frequency and space from direct intracranial recordings of the human brain. PLoS One 3, e3892.

Van Rullen, R., Gautrais, J., Delorme, A., Thorpe, S., 1998. Face processing using one spike per neurone. Biosystems 48, 229-239.

Vapnik, V., 1998. Statistical learning theory. John Wiley & Sons, New York.

Varela, F., Lachaux, J.P., Rodriguez, E., Martinerie, J., 2001. The brainweb: phase synchronization and large-scale integration. Nat. Rev. Neurosci. 2, 229-239.

Vindiola, M., Wolmetz, M., 2011. Mental encoding and neural decoding of abstract cognitive categories: a commentary and simulation. NeuroImage 54, 2822-2827.

Watanabe, H., Sato, M.A., Suzuki, T., Nambu, A., Nishimura, Y., Kawato, M., Isa, T., 2012. Reconstruction of movement-related intracortical activity from micro-electrocorticogram array signals in monkey primary motor cortex. J. Neural Eng. 9, 036006.

Yamashita, O., Sato, M.A., Yoshioka, T., Tong, F., Kamitani, Y., 2008. Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns. NeuroImage 42,1414-1429.

Yanagisawa, T., Hirata, M., Saitoh, Y., Kato, A., Shibuya, D., Kamitani, Y., Yoshimine, T., 2009. Neural decoding using gyral and intrasulcal electrocorticograms. NeuroImage 45,1099-1106.

Yeager, J.D., Phillips, D.J., Rector, D.M., Bahr, D.F., 2008. Characterization of flexible ECoG electrode arrays for chronic recording in awake rats. J. Neurosci. Methods 173, 279-285.