Scholarly article on topic 'Semantic attributes are encoded in human electrocorticographic signals during visual object recognition'

Semantic attributes are encoded in human electrocorticographic signals during visual object recognition Academic research paper on "Clinical medicine"

CC BY-NC-ND
0
0
Share paper
Academic journal
NeuroImage
OECD Field of science
Keywords
{Semantics / Electrocorticography / "High-gamma activity" / "Encoding models" / "Object recognition"}

Abstract of research paper on Clinical medicine, author of scientific article — Kyle Rupp, Matthew Roos, Griffin Milsap, Carlos Caceres, Christopher Ratto, et al.

Abstract Non-invasive neuroimaging studies have shown that semantic category and attribute information are encoded in neural population activity. Electrocorticography (ECoG) offers several advantages over non-invasive approaches, but the degree to which semantic attribute information is encoded in ECoG responses is not known. We recorded ECoG while patients named objects from 12 semantic categories and then trained high-dimensional encoding models to map semantic attributes to spectral-temporal features of the task-related neural responses. Using these semantic attribute encoding models, untrained objects were decoded with accuracies comparable to whole-brain functional Magnetic Resonance Imaging (fMRI), and we observed that high-gamma activity (70–110Hz) at basal occipitotemporal electrodes was associated with specific semantic dimensions (manmade-animate, canonically large-small, and places-tools). Individual patient results were in close agreement with reports from other imaging modalities on the time course and functional organization of semantic processing along the ventral visual pathway during object recognition. The semantic attribute encoding model approach is critical for decoding objects absent from a training set, as well as for studying complex semantic encodings without artificially restricting stimuli to a small number of semantic categories.

Academic research paper on topic "Semantic attributes are encoded in human electrocorticographic signals during visual object recognition"

Author's Accepted Manuscript

Semantic attributes are encoded in human electrocorticographic signals during visual object recognition

Kyle Rupp, Matthew Roos, Griffin Milsap, Carlos Caceres, Christopher Ratto, Mark Chevillet, Nathan E. Crone, Michael Wolmetz

www.elsevier.com

PII: S1053-8119(16)30800-X

DOI: http ://dx. doi.org/ 10.1016/j. neuroimage .2016.12.074

Reference: YNIMG13695

To appear in: Neuroimage

Received date: 2 August 2016 Revised date: 21 December 2016 Accepted date: 26 December 2016

Cite this article as: Kyle Rupp, Matthew Roos, Griffin Milsap, Carlos Caceres Christopher Ratto, Mark Chevillet, Nathan E. Crone and Michael Wolmetz Semantic attributes are encoded in human electrocorticographic signals during visual object recognition, Neuroimage

http://dx.doi.org/10.1016/j.neuroimage.2016.12.074

This is a PDF file of an unedited manuscript that has been accepted fo publication. As a service to our customers we are providing this early version o the manuscript. The manuscript will undergo copyediting, typesetting, an< review of the resulting galley proof before it is published in its final citable form Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain

Running head: Semantic Attributes in ECoG

Semantic attributes are encoded in human electrocorticographic signals

during visual object recognition

Kyle Ruppa, Matthew Roosb, Griffin Milsap'a, Carlos Caceres b, Christopher Ratto b, Mark Chevilletb, Nathan E. Crdne^, Michael Wolmetzb*

Lgineerim

' Department of Biomedical Engineering, Johns Hopkins University, 720 Rutland Ave., Baltimore, MD 21205, USA |\

b' The Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd., Laurel, MD 20723, USA

c' Department of Neurology, Johns Hopkins University, 600 N. Wolfe St., Meyer 2-161, Baltimore, MD 21287, USA

* Corresponding author. E-mail address: mikew@jhu.edu

ABSTRACT

Non-invasive neuroimaging studies have shown that semantic category and attribute information are encoded in neural population activity. Electrocorticography (ECoG) offers several advantages over non-invasive approaches, but the degree to which semantic attribute information is encoded in ECoG responses is not known. We recorded ECoG while patients named objects from 12 semantic categories and then trained high-dimensional encoding models to map semantic attributes to spectral-temporal features of the task-related neural responses. Using these semantic attribute encoding models, untrained objects were decoded with accuracies comparable to whole-brain functional Magnetic Resonance Imaging (fMRI), and we observed that high-gamma activity (70-110 Hz) at basal occipitotemporal electrodes was associated with specific semantic dimensions (manmade-animate, canonically large-small, and places-tools). Individual patient results were in close agreement with reports from other imaging modalities on the time course and functional organization of semantic processing along the ventral visual pathway during object recognition. The semantic attribute encoding model approach is critical for decoding objects absent from a training set, as well as for studying complex semantic encodings without artificially restricting stimuli to a small number of semantic categories.

Key words: Semantics, Electrocorticography, High-gamma activity, Encoding models, Object

recognition

Abbreviations: Blood Oxygen Level Dependent (BOLD), Electrocorticography (ECoG),

functional Magnetic Resonance Imaging (fMRI), Magnetoencephalography (MEG), mean rank accuracy (MRA), Representational Similarity Analysis (RSA)

gments:

Laboratory Internal Research and Development funding.

Conflict of interest: The authors declare no competing financial interests.

1 INTRODUCTION

The view that objects are encoded according to their semantic attributes or features, while not new, has become quite practical. Under an attribute-based view, a concept can be encoded over a large set of meaningful attributes, with each attribute assigned a value or set of values related to its probability, weight, or importance (Rosch, E., 1978). For example, the encoding of the concept "bird" assigns high probabilities to attributes typical of birds (has beak, flies, etc.) and low or zero probabilities to attributes atypical of birds (has four legs, manmade, etc). Substantial work has been done to catalogue the attributes and weights associated with different concepts, and attribute ratings can account for a host of human judgments about the relationships between concepts and the organization of categories (Binder et al., 2016; Cree and McRae, 2003; Garrard et al., 2001; Ruts et al., 2004). In related work on vector space models of semantics, automated methods can be used in place of human annotators to learn latent semantic features from the statistical properties of words and phrases in large text corpora (Deerwester et al., 1990; Mikolov et al., 2013; Pennington et al., 2014), and these latent features are similarly useful in accounting for human judgments (Pere 016).

Efforts to decompose concepts into their constituent attributes or features have been used to great effect in the study of knowledge representation in the human brain. Following methods pioneered by Mitchell and colleagues (2008) to learn relationships between individual semantic features and the neural activity patterns they evoke, subjects perform tasks that require semantic processing - viewing or naming objects (Clarke et al., 2014), reading words or sentences (Wehbe et al., 2014), considering semantic attributes (Sudre et al., 2012), generating category exemplars (Simanova et al., 2015), watching movies (Huth et al., 2012), or listening to stories (Huth et al., 2016) - while neural responses are recorded with functional magnetic resonance imaging (fMRI) or magnetoencephalography (MEG). Because stimuli can be represented in terms of their constituent semantic attributes or features, a mapping can be learned between each semantic feature and its associated neural responses (i.e. voxel intensities, MEG sensor amplitudes), typically through linear regression. These encoding models project semantic features into a neural feature space, and similarly, decoding models can be used to project recorded neural activity patterns into a semantic feature space.

The resulting neurosemantic models have provided new insights into conceptual knowledge representation in the mind and brain. The fact that neurosemantic models can be used to successfully learn mappings between semantic and neural features suggests that the brain's representation of objects involves decomposition into semantic features. This paradigm also provides a framework for testing theories about what specific semantic features are represented in the human brain (Just et al., 2010), how they are encoded in neural activity (Huth et al., 2016), and how cognitive processes modulate neurosemantic representations (£ukur et al., 2013). Likewise, from a decoding perspective, decompositional neurosemantic models are very powerful in that they can interpret neural activity from concept classes they have not been trained on in a process termed zero-shot learning (Palatucci et al., 2009).

The impact of this approach, though, is limited by the quality and quantity of available neural data. Non-invasive neuroimaging methods are subject to lower signal-to-noise ratios, trade-offs between temporal and spatial resolution, and indirect estimates of neural activity. Invasive alternatives like electrocorticography (ECoG) can only be used in the relatively rare clinical setting when implanting electrodes on the surface of the cortex is a clinical necessity. As a result, spatial coverage is determined solely by clinical considerations, which leads to varied anatomical sampling across patients. At the same time, ECoG offers high temporal resolution, a high signal-to-noise ratio due to direct contact between electrodes and the cortical surface, and more direct observations of neural processing. Evidence of this can be found in studies showing that ECoG responses correlate well with spiking activity (Manning et al., 2009; Ray et al., 2008) and hemodynamic responses (Logothetis et al., 2001; Niessing et al., 2005), with activity in high-gamma frequencies (e.g. 70-110 Hz) serving as a particularly good index of underlying neural processing.

Despite the potential advantages, there have been relatively few studies of semantic attribute representation using ECoG. The few attempts to use ECoG for semantic decoding have relied on discriminative approaches over a small number of trained classes or categories (Liu et al., 2009; Wang et al., 2011). In one of the only published examples of semantic decoding from ECoG, Wang and colleagues (2011) asked patients to perform several different tasks that activated representations of semantic properties (e.g. visual object naming), and then trained

Support Vector Machine (SVM) and Gaussian Naïve Bayes (GNB) classifiers to decode the evoked responses to one of the three possible target categories (i.e. foods, tools, and body parts). Performance varied across subjects, tasks, and classifier types, with mean classification rates of approximately 56% correct and a range from approximately 40% to 74% (where 33% is chance), indicating that substantial category information can be extracted from ECoG. While encouraging, conclusions drawn from a very restricted number of classes (e.g. foods, tools, and body parts) or dimensions of variation (e.g. living vs. non-living, large vs. small) may be partially confounded by expectation and perceptual set effects that cause subjects to artificially attend to and process these dimensions.

Chen and colleagues (2016) recently extended the ECoG-based study of semantic representations to 100 objects across a range of semantic categories and attributes. They adapted the searchlight-based Representational Similarity Analysis (RSA) typically used in fMRI to assess where and when semantic information was encoded in ECoG responses during a picture naming task. The technique assumes that semantic information is represented as complex spatiotemporal patterns detectable by ECoG, and looks for spatiotemporal structure in the neural responses that correlate with the structure inherent in semantic representations. Using RSA, the authors found evidence of semantic encoding in the ventral pathway from basal occipital-temporal to anterior temporal lobe regions, but these results were primarily accounted for by a simple binary semantic model that coded items as either living or non-living (stimuli were evenly split between these two categories).

ECoG research in other language domains like speech perception (Pasley et al., 2012) and speech production (Mugler et al., 2014) have used generative encoding and decoding approaches to study language processing as it naturally varies across a range of stimuli and dimensions, but these methods have yet to be applied to semantics. In the current report, we adapted the decompositional semantic encoding approach previously used with fMRI and MEG data for ECoG (as summarized in Figure 1) to assess the degree to which semantic attributes are encoded in the ECoG signal. To do this, we recorded ECoG while patients named objects from 12 different semantic categories. Using these responses, high-dimensional semantic attribute encoding models were trained to decode objects unseen during model training. The trained

models were then analysed in terms of which semantic dimensions were reliably encoded across different electrodes, time points, and frequency bands.

2 MATERIALS and METHODS 2.1 Data

2.1.1 Subjects

Electrocorticography was recorded from 9 patients with intractable epilepsy (2 female, 31 -44 years old) during in-patient monitoring for pre-surgical localization of their ictal onset zone and eloquent cortex. All patients provided informed consent according to a protocol approved by the Johns Hopkins Medicine Institutional Review Boards.

2.1.2 Paradigm

Patients performed a standard visual object naming task with the same 60 line drawings used by Mitchell (2008). Briefly, white line drawings were presented on a black background, with a centered white fixation cross present during inter-stimulus intervals. Drawings were shown for 1s, with an inter-stimulus interval varying randomly between 3.5 - 4.5 s. Patients were instructed to name the pictured object as soon as it came to mind, or to say "pass" when they could not immediately answer. Four patients were familiarized with the stimuli beforehand using one of two procedures: two patients were simply shown the stimuli with labels and instructed to learn them, and two patients were asked to provide a verbal description for each object. The remaining

were aske exposed to

five patients were not exposed to the stimuli prior to the naming task.

The 60-item stimulus set consisted of 5 different objects from each of 12 different categories (animals, body parts, buildings, building parts, clothing, furniture, insects, kitchen utensils, man-made objects, tools, vegetables, and vehicles; for a complete list of objects by category, see Supplementary Table 2). For each patient, six blocks of data were collected. All 60 objects were shown in each block in a pseudorandom order. Both the behavioral paradigm and the ECoG data recording were implemented with BCI2000 (Schalk et al., 2004). Verbal responses and stimulus onset were both recorded through the analog input bank using a

microphone and photodiode, respectively. Behavioral performance was overall very good for all patients (Table 1). Occasional naming errors were not excluded from ECoG analysis.

2.1.3 ECoG recordings

Data analyzed from patients P2 through P9 were collected with standard ECoG grids and strips, each of which contained electrodes with 4 mm diameter and 10 mm spacing. For one patient, P1, a subset of analyzed electrodes was part of a high-density grid with 2 mm diameter and 5 mm spacing. Additional depth and micro-ECoG electrode arrays were implanted in a subset of patients but were not analyzed. Electrode placements were determined by clinical criteria and varied widely across patients (Table 1). For specific electrode locations in each of the 9 patients, see Supplementary Figure 1.

Gender

Electrode Placement

Electrodes (#)

Pathology

Stimulus Familiarization

Naming Accuracy

Female

Left temporal HD grid, left fronto-parietal grid, basal strips

Low grade glioma in posterior left middle temporal gyrus_

Left fronto-temporal grid, superior frontal grid, inferior frontal strip, basal strips_

No lesion

Presentation with captions

Left temporal grid, basal grid

Ganglio-cytoma (Who Grade I) in posterior inferior temporal gyrus_

Right fronto-temporal grid, basal strips

No lesion

Bilateral strips

No lesion

Presentation with captions

Right fronto-temporal grid, basal strips, frontal strips, occipito-parietal strip

No lesion

Object description

Right parietal grid, frontal strip, posterior

Previous right anterior temporal lobectomy

basal strips with amygdalo- hippocamp- ectomy

P8 Male 33 R Left occipital grid, temporal grid and strips, basal strips 95 Previous left anterior temporal lobectomy with amygdalo- hippocamp- ectomy Object description 89%

P9 Female 35 L Bilateral strips 106 No lesion None 98%

Table 1. Summary of patient demographics, electrode placement, and task performance.

Hemispheric language dominance (abbreviated here as LD) was verified in all patients by intracarotid amobarbital testing, fMRI, and/or electrocortical stimulation mapping.

2.1.4 Signal processing

ECoG signals were sampled at 1000 Hz, digitized, and recorded using the BlackRock Neuroport system. Recordings were made with a referential montage in which the reference was a contact on a 4-electrode strip that had been implanted for this purpose facing the dura mater, or in which the reference was a cortical contact chosen because of its greater distance from most other recording contacts and because of its low likelihood of functional responses. Channels were visually inspected and those identified as containing excessive noise were discarded. A common average reference, where each electrode was referenced to the grid or strip to which it belonged, was used to minimize spatial bias from the reference electrode. Signals were low-pass filtered to prevent aliasing, resampled to 256 Hz, and epoched by clipping from 250 ms before stimulus onset to 4000 ms post stimulus onset.

cctral pow

Spectral power was extracted using one of two different techniques, autoregressive estimation or the Hilbert transform, depending on the goal. Autoregressive estimation permitted the extraction of a broad range of frequencies at the cost of temporal resolution. Models that used all extracted frequency bands were trained and tested, as were models that used high-gamma features only. While models using all frequency bands achieved the best performance, models using high-gamma only performed nearly as well, and thus subsequent analyses explored models using only high-gamma features. Once we made this determination, Hilbert estimation was used

to extract high-gamma features with higher time resolution, permitting analyses on the timing and cortical locations related to semantic processing (Oppenheim and Schafer, 1999).

Autoregressive spectral estimation was performed using the Burg method (Kay, 1988) with 500 ms windows and a 250 ms overlap. The log of the spectral power values were averaged over multiple frequency bins. Frequency bins consisted of delta (1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (15-30 Hz), gamma (30-50 Hz), and high-gamma (70-110 Hz). The autoregressive filter order was set to 26, and spectral estimation was performed with a frequency resolution of 2 Hz. To estimate high-gamma features using the Hilbert transform data was first

forward and backward filtered to a passband of 70-110 Hz using a 3rd order Butterworth filter. The Hilbert transform was used to generate the analytic signal, and the magnitude of this signal

was squared to calculate signal power. Features were then calculated by averaging over 250 ms windows, sliding every 31.25 ms.

2.2 Encoding Model

Linear ridge regression was used to learn the encoding model parameters (Chen et al., 2014; Friedman et al., 2001) relating semantic attribute ratings (section 2.2.2) to neural activity features (section 2.2.1). Linear ridge regression is a least-squares technique that employs

regularization via an l2-norm penalty, where this penalty effectively biases coefficients toward zero in exchange for reducing the variance on their estimates. Regularization is usually necessary in regression problems involving high-dimensional data as a safeguard against over-fitting. Our encoding model predicted neural features from semantic features by linear weighting:

OO T„

where nm was the m neural feature, s was a vector of semantic features associated with the stimulus, bm was a vector of weights, and 8m was an error term. The ridge regression solution for determining the weights was given by

bm = argmin||Sbm - njlz + A||b

= (STS +AI)"1STn

where nm was a vector of mth neural features for a set of trials, S was a JxK matrix of K=218 semantic features for J=354 trials (J=354 rather than 360 because six trials for a held-out object are not used when training the model), and X was the regularization parameter. The semantic feature matrix may contain row vectors for repeated trials. Neural feature vectors, nm, were normalized to zero mean and unit variance over all trials. Initial testing using a leave-one-out cross-validation to optimize X from seven logarithmically-spaced values between 1 and 1000 often produced X=3.16, and so this value was adopted for all models. This determination was made in initial testing of data from P2 using earlier versions of the encoding model and was not specifically optimized to any models or results reported here. Fitting patient-specific X values might have improved overall performance of the encoding models slightly.

2.2.1 Neural features ¿r

Each neural feature used by the encoding model can be described as the signal power from a

specific electrode in a specific frequency band during a specific time window. To limit the inclusion of neural features to those primarily associated with semantic processing, neural features were restricted to times 0 to 750 ms post stimulus onset. To determine the degree to

which neural features may have been associated with spoken responses, additional analyses compared the timing of spoken responses with the semantic decoding performance over time. We found that peak decoding tended to occur before the vast majority of spoken response onsets (see section 3.3 in Result!)!^

Neural features were selected for their stability over stimulus presentations, similar to

correlation-based stability measures used with fMRI to select voxels (Mitchell et al., 2008;

Shinkareva et al., 2008). This approach was chosen because it is computationally

straightforward and is commonly used in similar studies. In this procedure, neural features are

selected that have stable response profiles across repeated presentations of the same item.

Correlation stability was calculated for a neural feature by averaging all pairwise Pearson

correlations between responses in blocks of trials. For example, for a data set with 60 objects and

six blocks of object presentations, the correlation stability of a neural feature is produced by

calculating the correlation between the 60 responses in block i to the 60 responses in block j and

averaging correlation coefficients for all possible ij pairs (15 in total). The most stable neural features were selected for use in the encoding model, up to 200 features. Features were always selected on training data as part of a nested cross-validation process.

2.2.2 Semantic features

Encoding models used semantic features from the human218 semantic knowledge base (Palatucci et al., 2009; Sudre et al., 2012) consisting of 218 interpretable semantic attributes. This semantic model was chosen because the 218 meaningful dimensions facilitated analysis of semantic dimension encoding at specific cortical sites (see section 2.3.2). Attribute ratings were collected by Palatucci (2009) by asking a series of 218 questions to a group of Amazon Mechanical Turk users about 1000 different nouns, including the 60 objects included in this study. Questions probed a variety of semantic properties, including size, usage, composition, and category, with answers on an ordinal scale from 1 to 5 (definitely not to definitely yes), and then rescaled to a range of -1 to 1. Lastly, the 218-element vector for each object was scaled to unity length. Each object was represented in this 218-dimensional semantic space in subsequent analyses.

For each model, performance was assessed via rank accuracy for decoding held-out objects. To decode a held-out neural activity pattern ñ generated by an untrained object, ñ was compared via cosine distance to a set of predicted neural activity patterns generated by applying encoding model P to the semantic attribute vectors for the 60 objects. These distances determined the relative ranks of the 60 objects, and rank accuracy for the correct object ranked ith among 60 objects was computed as 100*(60 - i)/59. Within-category rank accuracies were also computed by limiting the possible objects to the correct object and the remaining four objects falling in the same semantic category as the held-out object. Rank accuracy is one of several possible metrics for assessing encoding model performance; qualitatively similar results were achieved using alternative metrics (e.g. leave-two-out PairedPerf, as in Mitchell et al., 2008).

FIGURE 1 HERE

2.2.3 Model validation

Training and testing of encoding models consisted of three phases: feature selection, model training, and model testing. Model training, i.e. estimation of model weights, was always performed using individual trials, rather than aggregating the six presentations to produce a single neural activity pattern for an object. Feature selection and model testing were performed using one of four different conditions: (1) responses averaged over all six presentations and from all frequency bands, (2) responses averaged over all six presentations and from high-gamma only, (3) single-trial responses from all frequency bands, and (4) single-trial responses from high-gamma only. The testing for the response-averaged condition involved calculating rank accuracies by comparing 60 predicted neural activity patterns to a neural activity pattern

averaged over six recordings, while the testing for the single-trial condition compared predicted neural activity patterns to a single recording six times.

on comp

the si : the si

Features were selected via cross-validation to either optimize the single-trial rank or the response-averaged rank accuracy. A nested leave-one-object-out^ross-validation procedure was used for all model validation. The 60-fold outer loop of the procedure was used to iterate over each of the 60 objects as the held-out object. For each of these folds, a 59-fold inner loop was used to select the optimal neural feature set for the held-out object. For a given fold of the outer loop, the correlation stability was calculated for each neural feature. Then, for each fold of the corresponding inner loop, mean rank accuracy (MRA) was calculated for encoding models using between 1 and 200 of the features with the highest correlation stability (logarithmically spaced). The MRA curves were averaged across folds of the inner loop, and the number of features that produced the maximum MRA was adopted for that outer fold.

Rank accuracy results were statistically thresholded using a Monte Carlo procedure. A null model was trained for each patient and for each testing condition by permuting the rows of the semantic attribute matrix, effectively shuffling the noun labels. All model estimation and validation procedures were replicated, and in each case, the procedure was repeated 50000 times to produce a null probability distribution. P-values were calculated for each patient and decoding condition by computing the fraction of null distribution values that were greater than the actual MRA (while adding 1 to both the numerator and denominator to prevent p = 0). An analogous

procedure was performed to determine within-category rank accuracy thresholds, but for this Monte Carlo procedure, shuffling was done within categories rather than across all noun labels.

2.3 Feature analyses

2.3.1 Informative time points

To assess the time course of semantic attribute information in the ECoG signal relative to spoken responses, models were trained and tested for neural features restricted to individual temporal windows. Only high-gamma features calculated using the Hilbert transform were considered, because models using only high-gamma features performed substantially better across subjects than models restricted to any other single frequency band (see Supplementary Figure 2B). Additionally, given the goal of identifying the temporal windows with the most semantic content, the Hilbert transform method allowed for higher temporal resolution in our neural features, and high-gamma features intrinsically have a higher temporal resolution than lower frequency features. A sliding window of 250 ms and a step size of 31.25 ms was adopted, and the timing of the onset of significant rank accuracy as well as the peak was recorded for high-gamma that was time-locked to stimulus onsets. We performed a similar analysis on high-gamma locked to spoken response onsets. Response onsets and offsets were determined manually by listening to the spoken responses and using simultaneous visual inspection of the speech recording's spectrotemporal content. Time points were conservatively chosen to ensure articulation was entirely captured. Significance was determined by performing a single-tailed rank-sum test to compare rank accuracies at each time window to baseline, which was defined as time points between -1000 and -125 ms relative to stimulus onset (because of the 250 ms window, baseline actually contained information from -1125 to 0 ms). Bonferroni multiple comparisons corrections were applied individually for each patient across all time points.

For each window, the top 200 features were selected using correlation stability with

proper cross-validation. Within temporal window analyses, the number of neural features was

not selected within cross-validation, due to computational cost; rather, models were trained and

tested with varying numbers of neural features, and the maximum MRA was chosen. Due to non-

causality of spectral estimation (250/2 = 125 ms from windowing during Hilbert feature

extraction), a given temporal window could have contained some information from future

temporal windows. To account for this, results are also reported with respect to the leading edge of the extraction window (Table 3).

2.3.2 Informative electrodes

To determine where evoked responses were well-predicted by semantic attributes, we estimated encoding models that mapped semantic attributes to high-gamma responses at individual electrodes. This analysis focused on the three patients with the highest performing encoding models and was restricted to the high-gamma features from the 250 ms time window with the highest MRA for each patient. Cross-validation was used to estimate separate encoding models for each individual electrode, and predicted high-gamma values were calculated for each object. For each channel, we calculated Pearson correlations between the set of 360 actual high-gamma values (six presentations of 60 objects) and the set of 60 predicted high-gamma values (replicated six times to match the set of actual neural features). For a given channel, the resulting correlation coefficient indexed the degree to which high-gamma variance at that channel was accounted for by semantic attributes. Finally, p-values for correlation coefficients were calculated through Monte Carlo simulations for each patient that replicated the sensitivity analysis procedure with 10000 null semantic attribute matrices (formed through row shuffling). P-values were identified as significant using FDR correction (a = .05) across all electrodes for all three patients.

To assess what semantic information was reliably encoded in the ECoG signal, we examined the correlations between a reduced set of semantic features and the high-gamma features recorded at the reliably informative electrodes. We used the high-gamma features directly (rather than the model weights) to eliminate the impact of specific choices for regularization. The human218 semantic attributes are highly redundant, so to reduce the number of informative attributes, we applied principal component analysis (PCA) on the full semantic matrix (218 semantic dimensions by 1000 nouns), and mapped the 60 nouns from the original human218 semantic attributes to PC semantic attributes. PCA was applied to the normalized human218 vectors, and the resulting PC semantic vectors were also normalized to unity length. Based on fraction of variance explained, 14 of the 218 PC attributes or dimensions were significant (p<0.05) when thresholding using 1000 iterations of a Monte Carlo simulation

comparing the PC attributes to PCs found with null semantic attribute matrices (shuffled across nouns and attributes). Semantic attributes will be provided upon request.

Visual inspection of the significant PCs revealed that the first four components (fraction of variance explained: 16.5, 12.5, 6.3, and 3.6%) were readily interpretable. For each of these four PCs, we listed the nouns with the largest and smallest (most negative) values for the PC, as well as the human218 attributes with the largest and smallest projections onto the PC (inner products). Based on this information, these PCs or semantic dimensions were subjectively labeled man-made, large, manipulable, and edible respectively (see Table 2). Note that these

labels reflected the positive values of the PCs. In some cases, negative value

semantic meaning that was intuitively opposite of the label (e.g., an i made is alive), whereas for other PCs, the meaning of negative valu intuitive opposite of edible might be inedible, but based on the attribi

indicated a osite of man-less clear (e.g., an object lists, this

might be better labeled threatening).

PC1: Manmade PC2: Large PC3: Manipulable PC4: Edible

Nouns Attributes Nouns Attributes Nouns Attributes Nouns Attributes

tray manmade train bigger than microwave oven ipod has a front and back fruit used during meals

clipboard manufactured factory bigger than loaf of bread toy can be easily moved watermelon edible

frame invented hotel bigger than bed laptop manufactured pineapple has internal structure

box can use it museum taller than person guitar can pick it up tomato goes in mouth

Highest Ranking magnet has corners capitol bigger than car phone can hold it grapefruit tasty

envelope has flat or straight sides supermarket heavy watch invented tangerine vegetable or plant

notebook has writing on it mansion can fit inside it vacuum has moving parts lime smells good

dictionary can buy it bus has corners flute can control it vegetable has parts

basket made of metal bank has moving parts toaster can buy it lettuce you love it

lunchbox hold it to use it university bigger than house cellphone has at least one hole corn comes from a plant

leopard has feelings pepper found in house shore taller than person fog can change shape

wolf has a backbone sugar found in restaurant sea can walk on it stain found in the sky

gorilla has feet pickle edible island has seeds mist lightweight

Lowest Ranking bear has some sort of nose onion hold it to use it mist can change shape dot scary

monkey animal pear goes in mouth coast comes from a plant sunburn flat

antelope has a face grape can be easily moved puddle vegetable or plant rust silver

raccoon conscious cinnamon lightweight fog part of something larger dandruff hard to catch

deer once alive kernel can pick it up hill bigger than house measles can cause you pain

ape alive raisins can hold it meadow bigger than bed dent dangerous

hyena grows spice can hold it in one hand valley bigger than car spark avoid touching it

Table 2. Top four Principal Components (PC) from Human218. For each PC, the 10 objects with the highest values and the 10 objects with the lowest values for that component are listed, along with the 10 human218 attributes with the largest and smallest projections (i.e. inner products) on to that PC.

Correlation coefficients were computed between these four PCs and the responses from the informative high-gamma features using Pearson correlation. Coefficients were computed using all 60 objects and all six presentations for a total of 360 PC-feature pairs for each PC. Correlation p-values were calculated under a Gaussian distribution assumption.

3 RESULTS

Zero-shot object decoding using a model trained to encode 218 semantic attributes as neural activity features was significantly better than chance for all nine patients, for nearly all decoding conditions (Figure 2). FDR correction (a = .05) was applied across all 9 patients and 4 decoding conditions. The subjects (P1-P9) were arranged in decreasing order of MRA. Using recorded signals from all frequency bands aggregated across six presentations per object (to improve SNR), MRA across patients for held-out objects was 76% and ranged from 65% (P9) to 91% (P1), where chance rank accuracy is 50%. This means that for the patient with the best performing encoding model, the target object was ranked, on average, sixth amongst a list of 60 candidate objects (see Supplementary Table 2 for item-by-item results and comparison with decoding performance from a comparable fMRI data set). Note that the list of candidate objects is constrained only by the number of items for which semantic vectors are defined.

FIGURE 2 HERE

Performance remained significantly better than chance for 8 of 9 patients when encoding models used high-gamma activity only (70-110 Hz) and trials were aggregated across presentations, and high-gamma models for all 9 patients were significant when decoding from

individual trial responses. Compared to models using all frequency bands, high-gamma model mean rank accuracy across patients fell only slightly to 73% with a range of 58% (P9) to 88% (P1). Semantic information was still extractable under low SNR, high variability conditions of single trial testing, but mean rank accuracy fell substantially (mean = 67%, range = 56% - 82%; high-gamma mean 66%, range = 59% - 81%). Still, rank accuracy from the patient with the best performing encoding models (P1) remained remarkably high for single-trial decoding (82%).

We also calculated winner-take-all accuracy where classifications are scored as correct only when the target object occupies the top overall rank. These results can be found in Supplementary Figure 3.

3.1 Semantic resolution of the encoding model

To assess whether the learned models encoded semantic detail beyond the basic semantic category associated with each object, we calculated within-category rank accuracies for all patients and for all objects. For example, when decoding the left-out neural activity pattern evoked by the item butterfly, we rank ordered only the five objects from the stimulus set from the insect category (ant, bee, beetle, butterfly, and fly), rather than the entire set of 60 objects. Within-category rank accuracies that are reliably higher than chance (i.e. 50%) would therefore indicate that the model was encoding semantic detail of a finer grain than category identity.

We restricted our analysis to conditions where neural responses were aggregated across 6 presentations for both all-frequency and high-gamma models. Results were mixed (see Supplementary Figure 4), with mean rank accuracy (MRA) ranging across patients from 38% to 66% using all frequencies and from 39% to 67% using high-gamma alone. Within-category rank accuracies using all frequencies were significantly better than chance in 4 patients: P1 (61%), P2 (63%), P3 (66%), and P4 (65%). Using high-gamma encoding models, within-category rank accuracies were significantly better than chance in two patients: P1 (67%) and P3 (65%). Significance was determined by applying FDR correction (a = .05) across all 9 patients and both decoding conditions. These results suggest that under some circumstances, object-specific semantic information beyond category-level semantics is extractable from ECoG.

3.2 Informative time points

Semantic processing during visual object naming has been demonstrated as early as 110 ms after stimulus onset (Clarke et al., 2014), and has been shown to continue through the spoken response (Chen et al., 2016). To give encoding models access to the full time course of semantic processing, while at the same time limiting access to spoken output processing, decoding results reported thus far were limited to neural features starting at stimulus onset and ending at 750 ms. To better measure the time course of semantic activity and its relation to spoken responses, a sliding window decoding approach was also tested. Because the temporal resolution of power estimates in the high-gamma frequency range is intrinsically better than that of estimates in lower frequencies, we focused our analysis on decoding accuracies using high-gamma frequencies only.

FIGURE 3 HERE

In all patients except P7, high-gamma encoding mod' individual window performed significantly better than c determined by comparing each individual time poi and applying a Bonferroni correction across al

ance from at least one igure 3). Significance was baseline (pre-stimulus) distribution ints for a single subject. Onset of

significant MRA ranged from 94 ms (P2 and P3) to 1156 ms (P8), with a median of 235 ms. Peak MRA ranged from 313 ms (P1 and P2) to 1156 ms (P8) with a median of 407 ms. Out of 8 subjects with significant MRAs, 5 patients (P1-4, P6) had MRA peak accuracies that occurred before any spoken responses.£ven when accounting for the non-causality of spectral estimation (Table 3), 3 patients (P1-2, P6) had MRA peaks that preceded the earliest spoken response. We also explored response-locked MRA, and found that 6 out of 9 subjects had MRAs that were significantly better than chance (see Supplementary Figure 5). The onset of significance for these results preceded speech onset for 5 of these 6 subjects.

Significant decoding windows (ms) Speech onsets (ms) Speech onsets before peak MRA (%)

Patient Onset Peak Earliest Median Uncorrected Corrected

P1 125 313 499 792 0.0% 0.0%

P2 94 313 468 816 0.0% 0.0%

P3 94 344 423 835 0.0% 0.3%

P4 250 469 494 833 0.0% 3.7%

P5 406 625 462 942 4.1% 18.4%

P6 219 313 573 991 0.0% 0.0%

P7 N/A N/A 560 1005 N/A N/A

P8 1156 1156 436 805 78.70% 84%

P9 500 500 392 855 29.30% 53.60%

Table 3. Peak decoding performance compared to speech onset times. The timing of significant decoding windows (both onset and peak) corresponds to the center of the 250 ms window used to estimate the high-gamma. The last column in this table accounts for this non-causality by adding 125 ms to the peak MRA time, and then calculating the fraction of speech onsets that occur before this adjusted time point.

3.3 Informative electrodes

For the three patients with the highest performing encoding models (P1, P2, and P3), high-gamma activity from individual electrodes was analyzed for semantic attribute information. For each patient, the 250 ms window with the maximum overall MRA was selected for further analysis: 188-438 ms (centered at 313 ms) for P1 and P2, and 219-469 ms (centered at 344 ms) for P3. For all three patients analyzed, high-gamma responses during the optimal decoding windows reliably encoded semantic attribute information along the left (dominant) ventral visual pathway (Figure 4). Responses from the left fusiform (P1, P2, P3), inferior temporal gyrus (P1, P2), and the parahippocampal gyrus (P1, P3) were significantly predicted by the semantic attribute encoding models. While there was less agreement across patients beyond the ventral visual pathway, high-gamma responses reliably encoded semantic attribute information at middle and superior temporal electrodes as well as supramarginal electrodes in P1, and from several inferior frontal electrodes in P2.

Finally, we examined the semantic profiles of each of the significant electrodes in basal occipito-temporal cortex by calculating the correlation between the high-gamma responses during optimal decoding windows and each of the four top semantic PCs (in decreasing order: manmade, large, manipulable, and edible). For all three patients, high-gamma responses at multiple sites in basal occipitotemporal cortex were significantly correlated with the semantic dimensions associated with manmade/living and size distinctions (See Figure 4b). In all

electrodes where the manmade dimension was significant, the size dimension was also significant. Furthermore, the signs of these two correlations were always matched, i.e. electrodes with positive manmade correlations also had positive correlations with the size dimension and vice versa. Note that nouns with negative loadings on the manmade dimension can readily be interpreted as living (see Table 2). All electrodes with positive correlations on the manmade and large dimensions are located medially on the basal occipito-temporal surface (P1: a and b; P3: a, b, c, and d). Conversely, electrodes with negative correlations on these two dimensions are located more laterally on the basal cortex (P1: c and d; P2: a, b, and c). Following these results, larger and manmade things evoke more high-gamma activity medially and smaller and living things evoke more high-gamma activity laterally. Significant negative correlations with the manipulable dimension were observed in medial regions in or near the parahippocampal gyrus (P1: e, f, and g; P3: a and d). Nouns that loaded negatively on this dimension can be categorized as scenes and places (or in the specific case of our stimulus set, buildings, and to a lesser extent, building parts). A single basal electrode (g in P1) showed a positive correlation with the edible

ace of the brain in

ormation about a

variety of representations, and that these responses can be used for decoding with varying degrees of performance (see Gunduz et al., 2012 for spatial attention; Hotson et al., 2016 for motor control; Martin et al., 2016 for speech production; Pasley et al., 2012 for speech perception). The series of results presented in the current study demonstrate that ECoG responses recorded during visual object naming are semantically rich, and untrained objects can be accurately decoded from these responses at rates equivalent to whole-brain fMRI in healthy subjects (despite variability in electrode placement and the presence of lesions in some patients). Beyond high rates of decoding accuracy, we observed that high-gamma activity recorded

approximately 200-500 ms after stimulus onset was associated with specific semantic dimensions for a subset of patients with basal occipitotemporal electrode coverage.

4.1 Frequency encoding of semantic attributes

Semantic attribute information was consistently found in high-gamma band activity, and the addition of other frequency bands yielded only slight improvements to the trained encoding models. Oscillatory activity and synchronization in the gamma range (25 - 128 Hz) and especially the high-gamma range (defined as 70 - 110 Hz in the current report) appears to be crukal for neural and _„»,„■ (Fries, 2009). ^ ■„ «s f,«,^ tad

has been strongly linked to different representations and processes, from low-level perceptual features to abstract conceptual features during visual processing (Jacobs and Kahana, 2009), voluntary motor commands (Cheyne et al., 2008), as well as language processing (Crone et al., 2006) where very accurate language mapping for neurosurgical patients has been demonstrated from high-gamma activity (Babajani-Feremi et al., 2015)

4.2 Timing of semantic attribute information

The time course of semantic processing during object recognition can be estimated by tracking encoding model performance over time. The best semantic encoding models (i.e. for P1, P2, and P3) began performing significantly better than chance at a mean of 104 ms post stimulus onset, with accuracies peaking at a mean of 323 ms (using 250ms windows centered at the reported times). Comparable results in visual object recognition have been reported with EEG (VanRullen and Thorpe, 200Í; Simanova et al., 2010; Chan et al., 2011b), MEG (Clarke et al., 2014; Cichy et al., 2016), and ECoG (Vidal et al., 2010; Chan et al., 2011a). Moreover, size and position-invariant visual object representations begin to emerge at 125 ms and 150 ms respectively (Isik et al., 2014), and more abstract semantic category or attribute information becomes available between 200 and 500 ms (Clarke et al., 2014).

4.3 Semantic encoding in basal occipitotemporal cortex

For the visual object naming task reported here, high-gamma responses from left basal occipitotemporal cortex were well predicted by our semantic attribute encoding models. Electrode placement and encoding model performance varied across patients, but the patients

with the best performing encoding models all had electrode strips covering the left language-dominant fusiform gyrus. For all three of these patients, high-gamma activity recorded at fusiform electrodes 200 ms - 500 ms post-stimulus onset was significantly predicted by the semantic attributes of the named object. High-gamma activity from neighboring electrodes over inferior temporal and parahippocampal gyri were also significantly predicted in a subset of these patients.

These results appear to be in close agreement, both spatially and temporally, with other studies relating semantic attributes to neural responses (Chen et al., 2016; Sudre et al., 2012). Left fusiform involvement is commonly reported during visual semantic tasks like naming, reading, and categorization. While the region is perhaps most associated with visual word forms and orthographic processing (Tsapkini et al., 2011), findings that different semantic categories like animals and tools differentially activate the fusiform gyrus are well-established (Simanova et al., 2014; Ishibashi et al., 2016). It appears that this region links visual form to meaning in hierarchical processing stages from occipital cortex to the medial and anterior temporal lobe (Patterson et al., 2007; Rogers et al., 2006; Starrfelt and Gerlach, 2007), and projects to distributed semantic representations throughout cortex (Binder and Desai, 2011).

4.3.1 Semantic dimensions

Moving beyond strict categorical distinctions, our results show that information along particular

semantic dimensions is encoded in basal occipitotemporal ECoG responses. While our encoding models used 218 attributes to predict ECoG responses, semantic dimensionality reduction was necessary to interpret the observed encoding patterns. PCA on the human218 database indicated that semantic variability could be mostly captured by four semantic dimensions: the degrees to which an object is manmade, large, manipulable, and edible. We used this semantic dimensionality reduction to interpret activity at the electrodes that were significantly predicted by the encoding model. Results showed that basal occipitotemporal responses in the high-gamma range were closely associated with the first three of these four dimensions, though responses from one anterior fusiform electrode for one patient was significantly correlated with values along the edible dimension.

For the dimensions labeled manmade and large, a medial to lateral functional organization was observed along basal occipitotemporal cortex. Values along these two dimensions positively correlated with high-gamma responses from medial electrodes, and negatively correlated with responses from lateral electrodes. As an illustration of this organization, an airplane (with high values on the manmade and large dimensions) elicits more high-gamma activity in medial electrodes as compared to lateral electrodes, an ant (with low values on the manmade and large dimensions) elicits more high-gamma activity in lateral as compared to medial electrodes, and items that are split on these dimensions (e.g. a relatively small but manmade object like a spoon) elicit moderate high-gamma activity medially and laterally (on average). This pattern was fully observed in subject P1; subject P2 had only lateral coverage (with negative correlations for these dimensions); subject P3 had only medial coverage (with positive correlations for these dimensions).

Interestingly, the parcellation of fusiform gyrus into ventromedial and ventrolateral regions as suggested by these results is supported by independent functional and anatomical connectivity analyses (Zhang et al., 2016), as well as task-based fMRI results. The medial-to-lateral organization of ventral temporal cortex for the manmade/living or animate/inanimate distinction has been well studied (Chao et al., 1999; Downing et al., 2006; Bell et al., 2009; Wiggett et al., 2009), and in other work, the large-to-small organization has also been observed along this axis (Konkle and Oliva, 2012). Konkle and Caramazza (2013) varied animacy and size simultaneously across a large set of animals and objects, and found that medial regions preferentially responded to large objects (congruent with our results), while lateral regions preferentially responded to animals regardless of size (partially congruent with our results). There are many factors to consider when resolving these results with the current report: relative to Konkle and Caramazza, animate items may have been under-represented in our stimulus set; many continuous dimensions were simultaneously varied here while animacy and size were treated as binary variables by Konkle and Caramazza; there may be individual differences in this organization that are not captured by our small patient group.

The semantic dimension labeled manipulable also correlated with high-gamma activity from multiple electrodes. Significant negative correlations for this dimension were largely

observed at medial basal temporal sites, including parahippocampal cortex. Importantly, negative loadings on this dimension corresponded to places and geographic features in the data set used to generated the semantic dimensions, and primarily buildings and building parts within the 60 experimental stimuli. The parahippocampal place area is involved in processing place and scene information (Aguirre et al., 1996; Epstein and Kanwisher, 1998), consistent with our results.

4.3.2 Perceptual features or semantic attributes?

Sensitivity to semantic attributes and categories in basal occipitotemporal cortex may be partially accounted for by differences or confounds in low-level visual features that exist between semantic categories. Indeed, the dimensions focused on here (e.g. animate-inanimate, large-small, and tools-places) can be associated with both visual and semantic concepts. For example, the animate-inanimate dimension may relate to visual features in that several of the semantic attributes that had the highest projections on to this dimension had to do with visual structure and form (e.g. has corners, has flat or straight sides, has a face). Canonical size is another feature that has both semantic and perceptual interpretations: surprisingly, it was recently shown that canonical size information can be recovered from primary visual responses during a reading task (Borghesani et al., 2016), blurring the line between traditional distinctions of perceptual and semantic features.

While low-level features were Mpt1 strictly controlled for or modeled in the current study, some studies of semantic decoding have attempted to account for low-level visual differences by including perceptual features in their models (Sudre et al., 2012; Clarke et al., 2014; Borghesani et al., 2016). These studies found posterior to anterior gradients from perceptual to conceptual representation such that posterior regions of basal occipitotemporal cortex contained information related to perceptual features, while regions just anterior contained information related to semantic features. Other studies have shown evidence for semantic representation in fusiform responses through semantic priming of words (Gold et al., 2006) and cross-modal generalization, where classifiers trained to discriminate animals from tools from left fusiform responses to one stimulus class (i.e. spoken names, written names, photographs, and natural sounds) can discriminate animals from tools using responses evoked by a different stimulus class (Simanova et al., 2014). Most of these results come from fMRI; the only other study to date to

analyze ECoG responses for semantic attribute information reported that semantic attribute models were much more predictive of neural responses than basic visual or phonological feature models, particularly in more anterior aspects of basal temporal cortex (Chen et al., 2016).

4.4 Limitations of the model

Very accurate decoding performance was achieved for untrained objects in a subset of patients, but our model was very modest in terms of the semantic embedding space, the neural features, and the statistical learning methods used to relate the two. Several different semantic embedding spaces have been used for predicting and interpreting neural data: those based on corpus statistics, those based on human judgments, and those that attempt to use neural responses themselves to define or optimize the embedding space for neural decoding (Fyshe et al., 2014). Different semantic embeddings are likely to be better matched for different recording modalities and paradigms, but whether there is substantial room for improvement beyond current results is unclear given the SNR and resolution achievable with today's neuroimaging tools (Bullinaria and Levy, 2013). In this work, we focused on a very limited set of concrete objects, but training encoding models for more complex concepts will require embeddings that can support more abstract concepts and concept compositionality.

5 CONCLUSION

Responses recorded with ECoG during visual object naming contain rich semantic attribute information that can be used to both decode untrained objects at very high levels of performance and study semantic encodings within individual subjects. For a subset of patients with basal occipitotemporal electrode coverage, we observed that high-gamma activity recorded approximately 200-500 ms after stimulus onset was associated with specific semantic dimensions: manmade-animate, canonically large-small, and places-tools. Individual patient results were in surprisingly close agreement with reports from other modalities on the functional organization of semantic information in ventral temporal cortex during object recognition. Semantic attribute encoding models are powerful tools that are critical for both generalizing outside the training set as well as for allowing the study of semantic encodings among large sets of diverse categories.

6 REFERENCES

Aguirre, G.K., Detre, J.A., Alsop, D.C., D'Esposito, M., 1996. The Parahippocampus Subserves

Topographical Learning in Man. Cereb. Cortex 6, 823-829. doi:10.1093/cercor/6.6.823 Babajani-Feremi, A., Narayana, S., Rezaie, R., Choudhri, A.F., Fulton, S.P., Boop, F.A., Wheless, J.W., Papanicolaou, A.C., 2015. Language mapping using high gamma electrocorticography, fMRI, and TMS versus electrocortical stimulation. Clin. Neurophysiol. doi:10.1016/j.clinph.2015.11.017 Bell, A.H., Hadj-Bouziane, F., Frihauf, J.B., Tootell, R.B.H., Ungerleider, L.G., 2009. Object Representations in the Temporal Cortex of Monkeys and Humans as Revealed by Functional Magnetic Resonance Imaging. J. Neurophysiol. 101, 688-700. doi: 10.1152/jn.90657.2008 Binder, J.R., Conant, L.L., Humphries, C.J., Fernandino, L., Simons, S.B., Aguilar, M., Desai, R.H., 2016. Toward a brain-based componential semantic representation. Cogn. Neuropsychol. 33, 130-174. doi:10.1080/02643294.2016.1147426 Binder, J.R., Desai, R.H., 2011. The neurobiology of semantic memory. Trends Cogn. Sci. 15,

527-536. doi:10.1016/j.tics.2011.10.001 Borghesani, V., Pedregosa, F., Buiatti, M., Amadon, A., Eger, E., Piazza, M., 2016. WORD MEANING IN THE VENTRAL VISUAL PATH: A PERCEPTUAL TO CONCEPTUAL GRADIENT OF SEMANTIC CODING. NeuroImage. doi:10.1016/j .neuroimage.2016.08.068 Bullinaria, J.A., Levy, J.P., 2013. Limiting factors for mapping corpus-based semantic

representations to brain activity. Chan, A.M., Baker, J.M., Eskandar, E., Schomer, D., Ulbert, I., Marinkovic, K., Cash, S.S.,

Halgren, E., 2011a. First-pass selectivity for semantic categories in human anteroventral temporal lobe. J. Neurosci. 3|y(8119-18129. doi:10.1523/JNEUROSCI.3122-11.2011 Chan, A.M., Halgren, E., Marinkovic, K., Cash, S.S., 2011b. Decoding word and category-specific spatiotemporal representations from MEG and EEG. NeuroImage 54, 30283039. doi:10.1016/j .neuroimage.2010.10.073 Chao, L.L., Haxby, J.V., Martin, A., 1999. Attribute-based neural substrates in temporal cortex

for perceiving and knowing about objects. Nat. Neurosci. 2, 913-919. Chen, M., Han, J., Hu, X., Jiang, X., Guo, L., Liu, T., 2014. Survey of encoding and decoding of visual stimulus via FMRI: an image analysis perspective. Brain Imaging Behav. 8, 7-23. doi: 10.1007/s 11682-013-9238-z Chen, Y., Shimotake, A., Matsumoto, R., Kunieda, T., Kikuchi, T., Miyamoto, S., Fukuyama, H., Takahashi, R., Ikeda, A., Lambon Ralph, M.A., 2016. The "when" and "where" of semantic coding in the anterior temporal lobe: temporal representational similarity analysis of electrocorticogram data. Cortex. doi:10.1016/j.cortex.2016.02.015 Cheyne, D., Bells, S., Ferrari, P., Gaetz, W., Bostan, A.C., 2008. Self-paced movements induce high-frequency gamma oscillations in primary motor cortex. Neuroimage 42, 332-342. Cichy, R.M., Pantazis, D., Oliva, A., 2016. Similarity-Based Fusion of MEG and fMRI Reveals Spatio-Temporal Dynamics in Human Cortex During Visual Object Recognition. Cereb. Cortex 26, 3563-3579. doi:10.1093/cercor/bhw135

Clarke, A., Devereux, B.J., Randall, B., Tyler, L.K., 2014. Predicting the Time Course of

Individual Objects with MEG. Cereb. Cortex. doi:10.1093/cercor/bhu203 Cree, G.S., McRae, K., 2003. Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). J. Exp. Psychol. Gen. 132, 163. Crone, N.E., Sinai, A., Korzeniewska, A., 2006. High-frequency gamma oscillations and human

brain mapping with electrocorticography. Prog. Brain Res. 159, 275-295. £ukur, T., Nishimoto, S., Huth, A.G., Gallant, J.L., 2013. Attention during natural vision warps

semantic representation across the human brain. Nat. Neurosci. 16, 763-770. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A., 1990. Indexing

by latent semantic analysis. JAsIs 41, 391-407. Downing, P.E., Chan, A.W.-Y., Peelen, M.V., Dodds, C.M., Kanwisher, N., 2006. Domain

Specificity in Visual Cortex. Cereb. Cortex 16, 1453-1461. doi:10.1093/cercor/bhj086 Epstein, R., Kanwisher, N., 1998. A cortical representation of the local visual environment.

Nature 392, 598-601. doi:10.1038/33402 Friedman, J., Hastie, T., Tibshirani, R., 2001. The elements of statistical learning. Springer series

in statistics Springer, Berlin. Fries, P., 2009. Neuronal gamma-band synchronization as a fundamental process in cortical

computation. Annu. Rev. Neurosci. 32, 209-224. Fyshe, A., Talukdar, P.P., Murphy, B., Mitchell, T.M., 2014. Interpretable Semantic Vectors

from a Joint Model of Brain-and Text-Based Meaning, in: The 52nd Annual Meeting of the Association for Computational Linguistics. Garrard, P., Lambon Ralph, M.A., Hodges, J.R., Patterson, K., 2001. Prototypicality,

distinctiveness, and intercorrelation: Analyses of the semantic attributes of living and nonliving concepts. Cogn. Neuropsychol. 18, 125-174. Gold, B.T., Balota, D.A., Jones, S.J., Powell, D.K., Smith, C.D., Andersen, A.H., 2006. Dissociation of Automatic and Strategic Lexical-Semantics: Functional Magnetic Resonance Imaging Evidence for Differing Roles of Multiple Frontotemporal Regions. J. Neurosci. 26, 6523-6532. doi:10.1523/JNEUR0SCI.0808-06.2006 Gunduz, A., Brunner, P., Daitch, A., Leuthardt, E.C., Ritaccio, A.L., Pesaran, B., Schalk, G., 2012. Decoding covert spatial attention using electrocorticographic (ECoG) signals in humans. Neuroimage 60, 2285-2293. Hotson, G., McMullen, D.P., Fifer, M.S., Johannes, M.S., Katyal, K.D., Para, M.P., Armiger, R., Anderson, W.S., Thakor, N.V., Wester, B.A., 2016. Individual finger control of a modular prosthetic limb using high-density electrocorticography in a human subject. J. Neural Eng. 13, 26017.

Huth, A.G., de Heer, W.A., Griffiths, T.L., Theunissen, F.E., Gallant, J.L., 2016. Natural speech

reveals the semantic maps that tile human cerebral cortex. Nature 532, 453-458. Huth, A.G., Nishimoto, S., Vu, A.T., Gallant, J.L., 2012. A Continuous Semantic Space

Describes the Representation of Thousands of Object and Action Categories across the Human Brain. Neuron 76, 1210-1224. doi:10.1016/j.neuron.2012.10.014 Ishibashi, R., Pobric, G., Saito, S., Lambon Ralph, M.A., 2016. The neural network for tool-

related cognition: An activation likelihood estimation meta-analysis of 49 neuroimaging studies. Cogn. Neuropsychol. 33(3-4), 241-256.

Isik, L., Meyers, E.M., Leibo, J.Z., Poggio, T., 2014. The dynamics of invariant object recognition in the human visual system. J. Neurophysiol. 111, 91-102. doi:10.1152/jn.00394.2013 Jacobs, J., Kahana, M.J., 2009. Neural Representations of Individual Stimuli in Humans

Revealed by Gamma-Band Electrocorticographic Activity. J. Neurosci. 29, 1020310214. doi: 10.1523/JNEUROSCI.2187-09.2009 Just, M.A., Cherkassky, V.L., Aryal, S., Mitchell, T.M., 2010. A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes. PLoS ONE 5, e8622. doi: 10.1371/journal.pone.0008622 Kay, S.M., 1988. Modern spectral estimation. Pearson Education India.

Konkle, T., Caramazza, A., 2013. Tripartite Organization of the Ventral Stream by Animacy and

Object Size. J. Neurosci. 33, 10235-10242. doi:10.1523/JNEUROSCI.0983-13.2013 Konkle, T., Oliva, A., 2012. A Real-World Size Organization of Object Responses in

Occipitotemporal Cortex. Neuron 74, 1114-1124. doi:10.1016/j.neuron.2012.04.036 Liu, H., Agam, Y., Madsen, J.R., Kreiman, G., 2009. Timing, Timing, Timing: Fast Decoding of Object Information from Intracranial Field Potentials in Human Visual Cortex. Neuron 62, 281-290. doi:10.1016/j .neuron.2009.02.025 Logothetis, N.K., Pauls, J., Augath, M., Trinath, T., Oeltermann, A., 2001. Neurophysiological investigation of the basis of the fMRI signal. Nature 412, 150-157. doi:10.1038/35084005 Manning, J.R., Jacobs, J., Fried, I., Kahana, M.J., 2009. Broadband Shifts in Local Field Potential Power Spectra Are Correlated with Single-Neuron Spiking in Humans. J. Neurosci. 29, 13613-13620. doi:10.1523/JNEUROSCI.2041-09.2009 Martin, S., Brunner, P., Iturrate, I., Millan, J. del R., Schalk, G., Knight, R.T., Pasley, B.N.,

2016. Word pair classification during imagined speech using direct brain recordings. Sci. Rep. 6.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013. Distributed representations of words and phrases and their compositionality, in: Advances in Neural Information Processing Systems. pp. 3111-3119. Mitchell, T.M., Shinkareva, S^JCarlson. A., Chang, K.-M., Malave, V.L., Mason, R.A., Just, M.A., 2008. Predicting Human Brain Activity Associated with the Meanings of Nouns. Science 320, 1191-1195. doi:10.1126/science.1152876 Mugler, E.M., Patton, J.L., Flint, R.D., Wright, Z.A., Schuele, S.U., Rosenow, J., Shih, J.J., Krusienski, D.J., Slutzky, M.W., 2014. Direct classification of all American English phonemes using signals from functional speech motor cortex. J. Neural Eng. 11, 35015. doi:10.1088/1741-2560/11/3/035015 Niessing, J., Ebisch, B., Schmidt, K.E., Niessing, M., Singer, W., Galuske, R.A.W., 2005. Hemodynamic Signals Correlate Tightly with Synchronized Gamma Oscillations. Science 309, 948-951. doi:10.1126/science.1110948 Oppenheim, A.V., Schafer, R.W., 1999. Discrete-Time Signal Processing, 2nd ed, Prentice-Hall

Signal Processing Series. Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M., 2009. Zero-shot learning with semantic output codes, in: Advances in Neural Information Processing Systems. pp. 1410-1418.

Pasley, B.N., David, S.V., Mesgarani, N., Flinker, A., Shamma, S.A., Crone, N.E., Knight, R.T., Chang, E.F., 2012. Reconstructing speech from human auditory cortex. PLoS Biol 10, e1001251.

Patterson, K., Nestor, P.J., Rogers, T.T., 2007. Where do you know what you know? The

representation of semantic knowledge in the human brain. Nat. Rev. Neurosci. 8, 976987.

Pennington, J., Socher, R., Manning, C.D., 2014. Glove: Global Vectors for Word Representation., in: EMNLP. pp. 1532-1543.

Pereira, F., Gershman, S., Ritter, S., Botvinick, M., 2016. A comparative evaluation of off-the-shelf distributed semantic representations for modelling behavioural data. Cogn. Neuropsychol. 33, 175-190.

Ray, S., Crone, N.E., Niebur, E., Franaszczuk, P.J., Hsiao, S.S., 2008. Neural Correlates of High-Gamma Oscillations (60-200 Hz) in Macaque Local Field Potentials and Their Potential Implications in Electrocorticography. J. Neurosci. 28, 11526-11536. doi: 10.1523/JNEUR0SCI.2848-08.2008

Rogers, T.T., Hocking, J., Noppeney, U., Mechelli, A., Gorno-Tempini, M.L., Patterson, K., Price, C.J., 2006. Anterior temporal cortex and semantic memory: reconciling findings from neuropsychology and functional imaging. Cogn. Affect. Behav. Neurosci. 6, 201213.

Rosch, E., 1978. Principles of categorization, in: Lloyd, B., Rosch, E. (Eds.), Cognition and Categorization. Erlbaum Associates, Hillsdale, NJ, pp. 27-48.

Ruts, W., De Deyne, S., Ameel, E., Vanpaemel, W., Verbeemen, T., Storms, G., 2004. Dutch norm data for 13 semantic categories and 338 exemplars. Behav. Res. Methods Instrum. Comput. 36, 506-515.

Schalk, G., McFarland, D.J., Hinterberger, T., Birbaumer, N., Wolpaw, J.R., 2004. BCI2000: a general-purpose brain-computer interface (BCI) system. Biomed. Eng. IEEE Trans. On 51, 1034-1043.

Shinkareva, S.V., Mason, R.A., Malave, V.L., Wang, W., Mitchell, T.M., Just, M.A., 2008.

Using fMRI Brain Activation to Identify Cognitive States Associated with Perception of Tools and Dwellings. PLoS ONE 3, e1394. doi:10.1371/journal.pone.0001394

Simanova, I., Hagoort, P., Oostenveld, R., van Gerven, M.A.J., 2014. Modality-Independent Decoding of Semantic Information from the Human Brain. Cereb. Cortex 24, 426-434. doi: 10.1093/cercor/bhs324

Simanova, I| van Gerven, M.A.J., Oostenveld, R., Hagoort, P., 2015. Predicting the Semantic Category of Internally Generated Words from Neuromagnetic Recordings. J. Cogn. Neurosci. 27, 35-45. doi:10.1162/jocn_a_00690

Simanova, I., van Gerven, M., Oostenveld, R., Hagoort, P., 2010. Identifying Object Categories from Event-Related EEG: Toward Decoding of Conceptual Representations. PLoS ONE 5, e14465. doi:10.1371/journal.pone.0014465

Starrfelt, R., Gerlach, C., 2007. The visual what for area: words and pictures in the left fusiform gyrus. Neuroimage 35, 334-342.

Sudre, G., Pomerleau, D., Palatucci, M., Wehbe, L., Fyshe, A., Salmelin, R., Mitchell, T., 2012. Tracking neural coding of perceptual and semantic features of concrete nouns. NeuroImage 62, 451-463. doi:10.1016/j.neuroimage.2012.04.048

Tsapkini, K., Vindiola, M., Rapp, B., 2011. Patterns of brain reorganization subsequent to left fusiform damage: fMRI evidence from visual processing of words and pseudowords, faces and objects. Neuroimage 55, 1357-1372. VanRullen, R., Thorpe, S.J., 2001. The Time Course of Visual Processing: From Early Perception to Decision-Making. J. Cogn. Neurosci. 13, 454-461. doi:10.1162/08989290152001880 Vidal, J.R., Ossandon, T., Jerbi, K., Dalal, S.S., Minotti, L., Ryvlin, P., Kahane, P., Lachaux, J.-P., 2010. Category-specific visual responses: an intracranial study comparing gamma, beta, alpha, and ERP response selectivity. Front. Hum. Neurosci. 4, 195. doi:10.3389/fnhum.2010.00195 Wang, W., Degenhart, A.D., Sudre, G.P., Pomerleau, D., Tyler-Kabara, E.C., others, 2011.

Decoding semantic information from human electrocorticographic (ECoG) signals, in: Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE. IEEE, pp. 6294-6298. Wehbe, L., Murphy, B., Talukdar, P., Fyshe, A., Ramdas, A., Mitchell, T., 2014. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. Wiggett, A.J., Pritchard, I.C., Downing, P.E., 2009. Animate and inanimate objects in human visual cortex: Evidence for task-independent category effects. Neuropsychologia 47, 3111-3117. doi:10.1016/j.neuropsychologia.2009.07.008 Zhang, W., Wang, J., Fan, L., Zhang, Y., Fox, P.T., Eickhoff, S.B., Yu, C., Jiang, T., 2016.

Functional organization of the fusiform gyrus revealed with connectivity profiles. Hum.

spectral estimation was performed on their neural signals to produce mean power over a variety of frequency bands and temporal windows (only high-gamma shown here). A subset of neural features (particular frequency bands at particular time windows at particular electrodes) was selected for use in the encoding model. (B) Linear ridge regression was used to learn a neural encoding model P, which maps from semantic attribute ratings S to neural feature values N. To decode a new neural activity pattern n generated by an untrained object, n is compared via cosine distance to a set of predicted neural activity patterns generated by applying P to a catalogue of possible objects and their semantic attributes.

Brain Mapp.

Figure 1. Training and testing encoding models from ECoG. (A) Patients named objects and

Figure 2. Zero-shot mean rank accuracy decoding performance by patient. Rank accuracies are reported for four encoding models: a full model that is trained and tested using all six presentations or trials of each object and all recorded frequency ranges, a high-gamma model that is trained and tested on all presentations of each object and only the high-gamma range, and restricted data versions of these models that are trained on all repetitions of objects in the training set, but tested using single presentations only. Accuracies were produced through leave-one-object-out cross-validation. Monte Carlo significance test procedures were used to calculate p-values for each condition, and FDR correction (a = .05) was applied to correct for multiple comparisons. Asterisks (*) denote significant results.

Figure 3. Decoding accuracies for sliding windows time-locked to stimulus onsets. Bar

graphs are histograms of speech onset times. MRA traces were calculated using a sliding window of 250 ms and a step size of 31.25 ms. The plotted times correspond to the center of the extraction window, and thus may contain information spanning -125 to +125 ms about the center. Dashed lines represent patient-specific chance performance, calculated as the mean MRA during the baseline period. Black lines above each trace indicate windows where MRA was significantly greater than baseline.

Figure 4. Significant electrode locations and semantic dimensions encoded in high-gamma activity for P1, P2, and P3. A) Encoding models were built for the top 3 patients that mapped from the semantic space to the high-gamma responses for each electrode. The red color scale represents the p-value of the correlation between predicted and observed high-gamma, with significant electrodes (FDR-corrected across all electrodes and subjects, a = 0.05) indicated with a yellow ring. B) Bar plots report correlation coefficients (absolute value, with the sign of the correlation displayed above each bar) for each of the four identified semantic dimensions (i.e. PCs), for each of the significant electrodes along the basal occipitotemporal cortex. Asterisks (*) denote significant correlations (FDR-corrected across all significant electrodes and PCs, a = .05).

• •I

Metal Edible Heavy

mth attribute

• • •

• • •

Frequency (Hz)

WBiiWlgMMgifcTfclilil-

Mean across presentations, all frequencies Mean across presentations, high-gamma Single presentation, all frequencies Single presentation, high-gamma

o o o o o o

O Q CO N (D m

o o o o o o

O O) CO N CD lO

o o o o o o

O O) 00 N CD lO

o o o o o o

O Q CO N CD m

o o o o o o

O O) CO N CD lO

o o o o o o

O O) 00 N CD lO

VidlAJ

Figure 4

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

■ I * *

ILr lii± I

|Manmade ] Large ] Manipulate ] Edible

1 n ■