Scholarly article on topic 'The Proteomic Landscape of Triple-Negative Breast Cancer'

The Proteomic Landscape of Triple-Negative Breast Cancer Academic research paper on "Biological sciences"

Share paper
Academic journal
Cell Reports
OECD Field of science

Abstract of research paper on Biological sciences, author of scientific article — Robert T. Lawrence, Elizabeth M. Perez, Daniel Hernández, Chris P. Miller, Kelsey M. Haas, et al.

Summary Triple-negative breast cancer is a heterogeneous disease characterized by poor clinical outcomes and a shortage of targeted treatment options. To discover molecular features of triple-negative breast cancer, we performed quantitative proteomics analysis of twenty human-derived breast cell lines and four primary breast tumors to a depth of more than 12,000 distinct proteins. We used this data to identify breast cancer subtypes at the protein level and demonstrate the precise quantification of biomarkers, signaling proteins, and biological pathways by mass spectrometry. We integrated proteomics data with exome sequence resources to identify genomic aberrations that affect protein expression. We performed a high-throughput drug screen to identify protein markers of drug sensitivity and understand the mechanisms of drug resistance. The genome and proteome provide complementary information that, when combined, yield a powerful engine for therapeutic discovery. This resource is available to the cancer research community to catalyze further analysis and investigation.

Academic research paper on topic "The Proteomic Landscape of Triple-Negative Breast Cancer"

Cell Reports


The Proteomic Landscape of Triple-Negative Breast Cancer

Graphical Abstract


Robert T. Lawrence,

Elizabeth M. Perez.....C. Anthony Blau,

Judit Vilien


In Brief

Lawrence et al. conducted a deep proteomic characterization of triple-negative breast cancer cell lines and tissues using mass spectrometry. They integrate these results with data generated in-house and from publicly accessible genomics and drug sensitivity resources. Quantitative proteomics is presented as a powerful addition to the expanding cancer analysis toolbox.


• Label-free deep proteome analysis of 24 human breast specimens

• Protein expression profiles demonstrate diversity of the breast cancer proteome

• Integrative analysis of proteomics, genomics, and drug sensitivity data

Lawrence et al., 2015, Cell Reports 11, 630-644 ciossMark April 28, 2015 ©2015 The Authors

http://dx.d0i.0rg/l 0.1016/j.celrep.2015.03.050


Cell Reports


The Proteomic Landscape of Triple-Negative Breast Cancer

Robert T. Lawrence,1 Elizabeth M. Perez,1 Daniel Hernandez,1 Chris P. Miller,23 Kelsey M. Haas,1 Hanna Y. Irie,4 Su-In Lee,15 C. Anthony Blau,2 3 and Judit Villen1*

department of Genome Sciences, University of Washington, Seattle, WA 98195, USA

2Center for Cancer Innovation, University of Washington, Seattle, WA 98109, USA

3Department of Medicine, Division of Hematology, University of Washington, Seattle, WA 98195, USA

4Icahn School of Medicine, Mount Sinai, New York, NY 10029, USA

5Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA 'Correspondence:

This is an open access article under the CC BY-NC-ND license (


Triple-negative breast cancer is a heterogeneous disease characterized by poor clinical outcomes and a shortage of targeted treatment options. To discover molecular features of triple-negative breast cancer, we performed quantitative proteomics analysis of twenty human-derived breast cell lines and four primary breast tumors to a depth of more than 12,000 distinct proteins. We used this data to identify breast cancer subtypes at the protein level and demonstrate the precise quantification of biomar-kers, signaling proteins, and biological pathways by mass spectrometry. We integrated proteomics data with exome sequence resources to identify genomic aberrations that affect protein expression. We performed a high-throughput drug screen to identify protein markers of drug sensitivity and understand the mechanisms of drug resistance. The genome and proteome provide complementary information that, when combined, yield a powerful engine for therapeutic discovery. This resource is available to the cancer research community to catalyze further analysis and investigation.


A key challenge for medicine in the 21st century is to harness the predictive power of molecular data to eradicate cancer (Arteaga and Baselga, 2012; Vidal et al., 2012; Weinstein et al., 1997). Like other cancers, breast cancer is caused by a series of inherited and/or acquired genetic aberrations that eventually lead to uncontrolled cell proliferation and metastasis. The diverse genetic drivers of breast cancer have been characterized in exquisite detail (Banerji et al., 2012; Curtis et al., 2012; Perou et al., 2000; Prat and Perou, 2011; Cancer Genome Atlas Network, 2012; Vogelstein et al., 2013). However, characterization of the proteome has lagged behind.

At the functional level, relevant genomic aberrations affect cellular functions by altering the activity and abundance of proteins. These effects are context specific and very much depend on the unique catalog of proteins expressed by different cell types. For example, a mutation in the BRAF kinase might have different functional outcomes in skin cancer than in liver or breast cancer. In addition to driving cellular functions, proteins are the most actionable and drug-treatable cellular components. Therefore, protein measurements are important to understand breast cancer and delineate breast cancer therapies.

In fact, protein measurements are being used today to classify breast cancer types according to their receptor status, in which the presence or absence of three cellular receptors (estrogen receptor ESR1, progesterone receptor PGR, and human epidermal growth factor receptor-2 ERBB2) is assessed via immunohisto-chemistry. Despite the reduced number of molecular features measured, this classification is the most useful today for chemotherapy selection. Irrespective of genomic aberrations, more than 80% of breast cancers express one or more of these receptors (Howlader et al., 2014) and are treatable by hormone deprivation and/or ERBB2 inhibition (Untch et al., 2014). Targeted therapies are not currently available for tumors that do not express these receptors, which are collectively referred to as triple-negative breast cancer (TNBC). TNBC is an important and unmet clinical problem. It tends to be more aggressive, is correlated with worse prognosis than receptor-positive subtypes (Hu-dis and Gianni, 2011), and is more common among young and African American women (Howlader etal., 2014). Identifying subtypes within the TNBC type, and proteins within those subtypes that can serve as therapeutic targets, will be extremely valuable.

Among protein measurements, reverse-phase protein arrays (RPPA) have been one of the most widely adopted tools for integrated genomics and drug sensitivity analysis, but a key limitation of RPPA technology is its lack of proteome coverage, generally less than 200 analytes (Tibes et al., 2006). As such, mRNA expression has been used as a proxy for protein levels, despite mediocre quantitative concordance (Gygi et al., 1999; Maier et al., 2009). Both mRNA and protein expression using RPPA outperform genomic data as predictors of drug sensitivity and clinical outcomes (Costello et al., 2014; Yuan et al., 2014). These results highlight the potential of systematic protein

expression analyses for breast cancer research in general and drug discovery in particular.

It is an excellent time to further investigate the TNBC proteome using more comprehensive techniques. Mass spectrometry in the form of "shotgun proteomics" is highly quantitative, and has reached the speed and sensitivity to measure proteomes at a depth comparable to gene expression studies (Kim et al., 2014; Wilhelm et al., 2014). In fact, proteomics is already making an impact in breast cancer research (Geiger et al., 2012a; Mog-haddas Gholami et al., 2013; Kennedy et al., 2014), but yet, to show its full potential, proteomics needs to be integrated with other types of big data.

Here we present an integrative approach using quantitative mass spectrometry to characterize TNBC proteomes both as readouts of genetic abnormality and as predictors of drug sensitivity. The goals of this work were to refine our understanding of breast cancer biology as an integrated proteogenomic landscape and to identify molecular diagnostic markers to improve drug selection in TNBC.


The TNBC Proteome

We assembled a panel of 20 human breast cell lines and four clinical tumors to analyze the proteomic landscape of TNBC (Figure 1A). These included 16 triple-negative cell lines covering mesenchymal-, luminal-, and basal-like subtypes, as well as three receptor-positive and one non-tumorigenic cell line to serve as a basis for comparison (Lehmann et al., 2011; Neve et al., 2006). Primary tumor tissues were derived from patients with metastatic TNBC (stages II to III). Cell lines were cultured and analyzed in duplicate to assess the precision of protein quantification. Proteins were digested in parallel with either lysyl-endopeptidase (LysC) or trypsin and separated at the peptide level into five fractions to enhance proteome coverage (Figure 1B). We used liquid chromatography-tandem mass spectrometry (LC-MS/MS) on a hybrid quadrupole-orbitrap mass spectrometer to acquire quantitative profiles of the pep-tides present in each fraction.

In total, more than 450 peptide fractions were analyzed, yielding approximately 20 million high-resolution mass spectra. Across the entire dataset, we identified 289,819 non-redundant peptide sequences mapping to at least 12,775 distinct proteins encoded by 11,466 genes (protein false discovery rate [FDR] < 1%). To facilitate comparison of specific protein isoforms, we additionally retained in our data truncated protein isoforms having high sequence coverage, bringing the total proteins analyzed to 15,524. The median protein had 15 peptide matches, four iso-form-specific peptide matches, and shared peptides with only one other protein in the dataset (Figures S1A-S1C). Median protein sequence coverage was 52%.

The number of proteins identified was consistent across cell lines, tissues, and replicates. On average, 80% of proteins were identified in both replicates. At least 9,000 proteins were found in each cell line (Figure 1C), which agrees well with other recent deep proteome experiments (Beck et al., 2011; Geiger et al., 2012b; Moghaddas Gholami et al., 2013; Nagaraj et al., 2011). These proteins represent 56% of the 20,537 genes anno-

tated in Uniprot/Swiss-Prot and at least 75% of genes included in the catalog of somatic mutations in cancer (COSMIC) (Figure 1D). As expected, we achieved near complete coverage of gene ontology categories involved in core cellular functions, such as primary metabolism, protein synthesis, and general transcription, and lower coverage of tissue-specific categories, such as transcription factors and receptors (Figure 1E).

To infer protein absolute abundances (as copies/cell), we used intensity-based absolute quantitation (iBAQ). Quantitative reproducibility between biological replicates was uniformly high across all cell lines, with an average R2 equal to 0.92 (Figure 1F; Figure S1D). Proteins that were highly abundant and identified in all samples were the most reproducibly quantified (median CV = 16%, Figure S1E). By comparison, the average R2 between different cell lines was 0.72, indicating significant differences in global protein expression.

The data presented here comprises more than 200,000 quantitative measurements of absolute protein abundance (Table S1). Innovations in instrumentation and extensive peptide fractionation prior to analysis have greatly increased the sensitivity and reproducibility of shotgun proteomics analysis, and our quantitative results compared favorably with a recent targeted proteomics study on many of the same cell lines (Kennedy et al., 2014) completed by the Clinical Proteomic Tumor Analysis Consortium (CPTAC). To facilitate use and dissemination of the data, we have developed a web resource ( in which protein abundances can be queried and correlated to genomic and drug sensitivity data, as presented below. To demonstrate the validity of our dataset as a quantitative resource, we examined several clinical breast cancer biomarkers including ESR1, PGR, and ERBB2 (Figure 2). These measurements accurately reproduce the known classification of cell lines based on immu-nocytochemistry (Subik et al., 2010) and correspond with known copy-number (CN) amplifications. In contrast to antibody staining, which assesses the presence or absence of expression, mass spectrometry provides sensitive and precise quantitation over a broad range. This is an important consideration for markers such as Ki-67, which are dynamically expressed in all cells. As another example, the cell line MDA-MB-453 stains negative for ERBB2 (Vranic et al., 2011) and was classified as a TNBC cell line (Neve et al., 2006), despite bearing a CN amplification. However, our results show that MDA-MB-453 expressed ERBB2 at levels 20-fold higher than the median, compared to several-hundred-fold overexpression of ERBB2 by cell lines such as BT474 and SKBR3.

Quantitative Analysis of TNBC Proteomic Subtypes

Molecular subtyping using gene expression or copy-number aberration has been used extensively to characterize clinical breast cancer specimens and cell lines (Banerji et al., 2012; Lehmann et al., 2011; Prat and Perou, 2011). We used hierarchical clustering to identify patterns based on correlation of protein expression profiles. This approach classified the panel of cell lines into two overarching groups containing four clusters (Figure 3A). To illustrate the relationship between driver gene alterations and proteome profiles, we show the most frequent census mutations and copy-number aberrations for each cell line (Figure 3A, top).

Cel ress

D Human Proteome





Collect fractions % MeCN, pH~10


Identify MS2

Quantify MSI

Gene Ontology representation

Ribosomal protein (157) Basal transcription factor (39) Nuclease (124) Deacetylase (17) G-protein (175) Phosphatase (150) Cytoskeletal protein (440) Protein kinase (295) Protease (274) Protein phosphatase (73) Extracellular matrix (176) Ublquitin-protein ligase (103) Transporter (289) Transcription factor (794) Receptor (387) Ion channel (74)

□ 71%

□ 70%

I 61% I 61% I 60% I 56% I 55%

□ 51%

□ 51% H 49%

I 28% □ 22%

12000 -, C10000

O 8000 CL

"o 6000

4000 -

2000 -

Mean = 9,754+/-451

ou-r^cccN^-r^m^-aiomcoinmmi/iuîrMrM HU^fflhUl^UHinMOlNHN^^^Nin


xxxx <<<<<> J--1--1--1- QQQQQ

5 5 5 5 5


, "J" 80000

N u 01 C 4-» (0

a¡ 5 40000 c "<u

O- „

R2 = 0.9372

40000 80000 120000 Protein Abundance (iBAQ) Replicate 1

Figure 1. Mass Spectrometry-Based Profiling of TNBC

(A) Overview of samples analyzed Is shown. N, normal epithelial; +, ER/PR/ERBB2+; L, lumlnal-llke; M, mesenchymal-llke; B, basal-like; ?, not matched. TNBC cell line classifications are according to Lehmann et al. (2011).

(B) Workflow of proteomics sample preparation and data collection is shown.

(C) Average number of proteins identified in each replicate (blue bars) and total number of proteins for each cell line (green bars) are shown. Error bars represent SD.

(D) Percentages of identified proteins relative to the Uniprot/Swiss-Prot database (left) and the COSMIC census (right) are shown.

(E) Number and percentage representations of indicated gene ontology categories are shown.

(F) Representative scatter plot for cell line SKBR3 replicate protein measurements shows quantitative reproducibility of iBAQ protein abundance.

Cell lines with similar genetic abnormalities tended to cluster together. As has been observed previously (Cancer Genome Atlas Network, 2012), PIK3CA mutations were associated with luminal breast cancer subtypes (80% of the cell lines in cluster 1), whereas TP53 mutations were characteristic of TNBC (100% of the cell lines in clusters 3 and 4). Mutations in the tumor suppressor NF1 were exclusive to the mesen-chymal-like subtype (cluster 4) and BCR mutations were exclusive to luminal cells (cluster 1).

Protein expression patterns within subtype clusters were still highly cell-type specific. To better illustrate this, we used principal component analysis (PCA) to project the distances between each proteome onto a two-dimensional coordinate system. Some of the sample proteomes formed tight clusters, while others were more distantly related to those in the same group (Figure 3B). Additional principal component dimensions are necessary to capture the proximity of cell lines, such as MFM223, BT474, and HCC1599, to their respective subtypes.

o k_ Q.

0 J 5000

> amplification ESR1

0 -6000 -,

■ .J.nL


"^■mocnLnoomoikDr^i— rv*Hvomoomi^<DQ' NKiM^Nro^oiomMinfornifliDNN i i

d u a h

m «Î u H m oo oi |- ^ U h H H H

inmmaicQiQ^> o ¡S55555ÊS E

»—I l—I *—I T—1 u I UJ L2J UJ UJ L1_J

UUUUV155555 1 X X 1 QQQQQ5

5 5 5 5 5

_ E E E ? P 3

Figure 2. Quantification of Clinical Breast Cancer Biomarkers

ESR1, estrogen receptor; PGR, progesterone receptor; ERBB2, human epidermal growth factor receptor-2; TP53, tumor protein p53; MKI67, Ki-67 antigen; EGFR, human epidermal growth factor receptor. Sample labels are shown (bottom). Absolute protein abundance was calculated using iBAQ. Error bars represent SD. Red dots indicate gene CN amplification (more than seven copies).

Intra-subtype correlation was also modest in earlier classification studies using mRNA expression (Lehmann et al., 2011), and the differences in mRNA may be further amplified at the protein level. The heterogeneity of protein expression underscores the importance of data-driven cell line selection in cancer research.

Accurate analysis of genes, transcripts, or proteins from heterogeneous clinical specimens represents a major challenge for precision medicine. The proteins expressed >10-fold in tumors versus the cell lines were enriched with proteins from blood cells and plasma (p < 0.001). These proteins accounted for as much as 20% of the total proteome intensity from the tumors. Given that TNBC cell lines should better represent the cellular component of the tumor, we correlated tumor samples to the centroids from each cell line cluster to identify to which proteomic subtype they belonged; we found that they were all more similar to clusters 3 and 4, an observation that also can be made based on PCA (Figure 3B).

Nevertheless, many proteins significantly over- or underex-pressed within each cluster could be identified. We were particularly interested in potential drug targets and proteins known to be involved in cancer biology. For example, the protein STAT5A, a pro-survival transcription factor, was expressed at high levels in the tumors and mesenchymal-like cell lines (Figure 3C). Using the first cluster as an example, we show how these proteins can be identified using our web-based resource (Figure S2A). The transcription factor FOXA1 was exclusively expressed by luminal-like cells, whereas TGFB1 was not found (Figure S2B). PPM1A, a protein involved in the suppression of TGF-ß-signaling pathways (Lin et al., 2006), was decreased in TNBC, while many proteins involved in immunity and metastasis, such as POSTN, MYLK, and HLA-A, were expressed at higher levels in TNBC (Figure S3A). Some of these proteins are thought to be provided by tumor-infiltrating immune cells and fibroblasts (Quail and Joyce, 2013), but here we show they also are abundant in the homogenous conditions of cell culture.

The composition of each cluster showed striking similarity to subtypes defined by mRNA expression arrays and morphological studies (Kenny et al., 2007; Lehmann et al., 2011; Neve et al., 2006). Cluster 1 contained the luminal breast cancer cell lines SKBR3, MCF7, and BT474 as well as luminal-androgen-re-ceptor cell line MFM-223, which expresses the androgen receptor protein, and MDA-MB-453, which overexpresses ERBB2 as described above. The set of proteins that was highly expressed by these cell lines was enriched for functions typically expected of cancer cells, including insulin and ErbB signaling, glycolysis, and nucleotide excision repair (Figure 3D). Cluster 2, most similar to the basal-like 2 gene expression subtype, contained, DU4475, SW527, HCC1806, MDA-MB-436, and the normal breast epithelial cell line MCF10A. Cluster 3 included all basal-like 1 cell lines: HCC38, HCC1143, HCC1937, BT20, and MDA-MB-468. Cluster 4, containing BT549, HS578T, MDA-MB-231, and MDA-MB-157, was identical to "mesenchymal-like/claudin-low" subtype (Lehmann et al., 2011), all showing stellate morphology in three-dimensional culture (Kenny et al., 2007) and high invasiveness in chamber assays (Neve et al., 2006). To better understand the biology of each subtype, we compared the distribution of protein abundance within gene ontology categories. Interestingly, luminal-like cells expressed higher levels of pathways associated with proliferation, such as cell cycle, growth factor signaling, metabolism, and DNA damage repair mechanisms (Figure 3E; Figure S3B). TNBC cell types, particularly the tumors and more invasive cells, expressed higher levels of pathways associated with metastasis, such as ECM-receptor interaction, cell adhesion, and angiogenesis (Figure 3E; Figure S3B). The expressions of proliferation and metastasis pathways were mutually exclusive, an observation also made in an analysis of mRNA expression profiles from claudin-low tumors (Prat et al., 2010). Thus, therapies targeting immune and metastatic signaling are an exciting avenue for TNBC treatment.

Differential Expression of Cancer-Signaling Proteins

The cancer genome has been studied extensively (Futreal et al., 2004; Vogelstein et al., 2013). We sought to characterize the abundance of proteins derived from known cancer census genes and signaling pathways (Figure 4; Figure S4). The abundance of

rrm [TnTi rTtTi rrn

m \£> oo *H Is*

m m ko m i/i

SCm tn 10 St m r* 3t £! m ^ 5 QJ g in O [5 00 5 m <H 2 Syïûi5au>ôgijayf!ûï!ï!ûû



Cluster 1

Cluster 2

Cluster 3 Cluster 4

8 TP53 ■2 PIK3CA




-2.5 -5

_ 60 -,

S 40 -

I 20 -

0 MDAMB157 MDAMB436 DU4475 HCC1806 BT549

SW527 0(P £6>n HS578T

MDAMB453 MCF7 ^ HCC1143 HCC38 Q^Tumor_C Tumor_A

BT474 O MDAMB468 HCC1599 HCC1937


* 5 £

i § i I


i 8 S te S 5 ®i

i s s s s áo ooo

I £ E £


Ubiquitin-mediated proteolysis w Insulin signaling pathway Nucleotide excision repair Cell cycle

Citrate (tri car boxy I ate) cycle Glycolysis/Gluconeogenesis _

-Log10(p-value) 0 5 10

Gap Junction Focal adhesion ECM-receptor Interaction Axon guidance

c ro T!

Insulin signaling TCA cycle

Basal transcription factor

Nucleotide excision repair

ECM-receptor interaction

Antigen presentation

Figure 3. The TNBC Proteome

(A) Hierarchical clustering of protein expression profiles computed using centered Pearson's correlation Identified four proteome subtypes as Indicated. Protein expression values were normalized to a scale from 0 to 1 prior to clustering. Frequent genetic aberrations are overlaid onto the proteome clustering results. Green circles represent exonic mutations. Red and blue circles represent CN gain (more than seven copies) or loss (0 copies), respectively. Colored background shading corresponds to cluster membership. At the time of writing, exome sequence and CN data were not available for MCF10A and SW527.

(B) Scatterplot of principal components 1 and 2. PCA was performed using protein expression profiles. Each point represents a sample. Colors represent hierarchical cluster membership from (A).

(C) Biological pathways enriched from the indicated proteins clusters. Inverted log10 p values are shown.

(D) Representative example shows a protein upregulated in cluster 4 and tumors. STAT5A, signal transducer and activator of transcription 5A. Error bars represent SD.

(E) Distribution of protein abundances within each cluster (colors) for indicated biological processes. For (A-E), cluster membership is indicated by the same colors used in (A), with tumor samples indicated in yellow.


Protein Abundance



r\ n r\ r^. rv rv- _ r

a < i CÛ ^

0 < 2 -

■2 -



i ^ z q:

T- CM CO in CO h-

x. x. * * * *


Q. Û. CL Q. Û. 0. Û. <<<<<<<

■ t- CC m ^

; LU U. Q: I-


O Cluster 1 O Cluster 2 O Cluster 3 O Cluster 4 O Tumors

SKBR3 MCF7 BT474 MDAMB453 MFM223 DU4475 MCF10A SW527 HCC1806 MDAMB436 HCC38 HCC1143 HCC1937 BT20 MDAMB468 BT549 HS578T MDAMB231 MDAMB157 HCC1599 Tumor_A Tumor_B Tumor_C Tumor_D


SKBR3 MCF7 BT474 MDAMB453 MFM223 DU4475 MCF10A SW527 HCC1806 MDAMB436 HCC38 HCC1143 HCC1937 BT20 MDAMB468 BT549 HS578T MDAMB231 MDAMB157 HCC1599 Tumor_A Tumor_B Tumor_C Tumor_D


1 ■


Protein Kinases

Figure 4. Expression of Cancer-Signaling Proteins

(A-G) Distribution of absolute abundance for each protein in the signaling network. Chart titles indicate subnetwork membership. Each data point represents a sample, color coded according to cluster membership from Figure 4A.

(H and I) Top 25 most differentially expressed proteins (highest SD between different samples) from (H) the COSMIC gene census or (I) the protein kinase su-perfamily are shown.

most signaling proteins spanned two to three orders of magnitude, but others were expressed similarly across all cell lines (Figures 4A-4G). These proteins included several members of the RAS-MAPK pathway, such as GRB2, HRAS/KRAS/NRAS, MEK1/2, and ERK1/2. In certain cases, expression of these proteins was associated with proteomic-based breast cancer subtypes. For example, CHEK2, HMGA2, POT1, and IL6ST were highly expressed by members of clusters 1 through 4, respectively (Figures 4H and 4I). However, protein expression was generally variable and cell-type specific. MLL3 was specifically expressed by BT474, BT20, and tumor A, which were each from different clusters (Figure 4H). HCC1806 and MDA-MB-436 specifically lacked expression of the protein kinase AKT1/2 (Figure 4B). PKCa was expressed at high levels in each of the cell lines from cluster 4, but also was highly expressed in DU4475

(Figure S4J). These results show that, despite overall concordance of whole proteome profiles with various cellular pheno-types, in most cases the expression of particular cancer proteins did not uniformly belong to one subtype or another.

The identification of proteins with very specific outliers or large dynamic range provides a valuable resource for TNBC drug development efforts. EGFR, ERBB2, ESR1, and PGR exemplify these properties (Figures 4A and S4D) and are already routine clinical targets in breast cancer, but there are many others. For example, ephrin type A receptors, which are involved in embryonic development and not normally present in adult tissues, were overexpressed by several orders of magnitude in many TNBC cell lines compared to luminal-like cells (Figure 4A). With the increasing availability of comprehensive quantitative proteomics datasets, protein expression should continue to be one of the

most valuable parameters for drug development and clinical diagnostics.

Isoform-Specific Protein Expression

The identification and quantification of protein isoforms resulting from alternative splicing is a significant challenge in proteomics, arising from the reduced number of isoform-specific peptides that are amenable to analysis by mass spectrometry. For this dataset, we first relied on isoform-specific peptides to unambiguously identify proteins mapping to the same gene in the Uniprot sequence database. This led to the identification of 1,860 protein isoforms that corresponded to 844 genes, 52 of which were members of the COSMIC census. Next, we examined the relative quantification of protein isoforms. Protein isoforms share long segments of identical sequence but are missing certain protein domains, resulting in altered signal intensity from those parts of the protein.

We relied on manual inspection to analyze the expression of isoforms for proteins involved in cancer progression. For most proteins, different isoforms were nearly perfectly correlated, indicating no difference in expression of specific isoforms, but there were notable exceptions. For example, we identified variants in the p65 subunit of the transcription factor NF-кБ, the tumor antigen CD47, and focal adhesion kinase PTK2. The protein sequence of the NF-кВ p65 variant is identical to the canonical sequence until proline 344, followed by the read-through translation of 33 amino acids and an early stop (Figure 5A). The alternative sequence lacks many important regulatory regions including the residues phosphorylated by IKKB that directly affect its transcriptional activity (Sakurai et al., 1999). The p65 variant was detected in two cell lines and was expressed at higher levels in all four tumor samples (Figure 5B). This result was confirmed by an isoform-specific peptide, FSSVQLR, which matched no other entry in the Uniprot protein sequence database (Figure 5A). This finding was especially interesting since the tumor proteomes were enriched in immuno-modulatory pathways. NF-кВ modulates the inflammatory response and plays an important role in cancer by promoting metastasis (Huber et al., 2004; Luo et al., 2004).

CD47 is an atypical G protein-coupled receptor with five membrane-spanning domains that participates in integrin signaling and is proposed to have many important roles in cancer (Sick et al., 2012). We detected two of the four known alternative splice variants that differentially encode the cytoplasmic tail. The cell line DU4475 expressed higher levels of the long isoform (Figures 5C and 5D), which is highly expressed in neurons (Brown and Frazier, 2001). Although little is known about the functional differences between the isoforms, it is likely that this tail mediates intracellular signaling downstream of the receptor.

PTK2, or focal adhesion kinase 1, is a tyrosine protein kinase involved in cell migration (McLean et al., 2005). We confirmed the presence of an N-terminally truncated form of this protein, which lacks the FERM (4.1-Ezrin-Radixin-Moesin) domain (Figure 5E). The FERM domain regulates PTK2 localization and interaction with other proteins to affect its activity (Frame et al., 2010). Interestingly, the full-length form appeared to be expressed higher in HS578T and BT20 cells based on the relative intensity of N-terminal versus C-terminal peptides (Figures 5E and 5F).

The differential expression of structural protein variants, many of which occur post-translationally, could be a significant regulatory mechanism in cancer. Further work will be necessary to systematically identify and accurately quantify these events.

Proteogenomic Analysis Identifies Signatures of Driver Mutations

Genetic aberrations such as sequence mutations and amplifications, which typically occur in regulatory proteins, can have pleiotropic downstream effects on other proteins that more directly drive cancer phenotypes. We integrated publicly available exome sequence and gene CN data from COSMIC (Forbes et al., 2011) with proteome profiles from 18 cell lines. Protein abundance trended positively with gene CN. The average expression of all proteins in each CN bin correlated strongly with CN (R = 0.96). However, it was more variable and correlated poorly on a pairwise basis (n = 56,579, R = 0.19) (Figure 6A). For example, the cancer census gene NDRG1 was not correlated with CN (R = -0.06) and was not highly expressed even when amplified (Figure 6B). This poor correlation is expected for proteins under high transcriptional, translational, or proteasomal control.

Driver mutations occur frequently in regulatory proteins such as protein kinases, E3 ubiquitin ligases, and transcription factors, which alter the physiology of the cell by modulating the abundance or activity of other proteins. For example, our data showed that DU4475, the cell line with an APC mutation, expressed more than 4-fold median levels of p-catenin (p = 3.3 x 10-4, heteroscedastic t test) (Table S1), which APC normally targets for degradation. Initially we characterized cellular subtypes according to protein abundance profiles and asked whether frequent genetic mutations were associated with these subtypes (Figure 3). An alternative analysis approach is to group cell lines by their mutational status and ask whether the abundance of specific proteins are associated with these mutations, as in the p-catenin and APC example.

We reasoned that mutations in certain driver genes, such as those in the same signaling pathway, would likely converge to regulate common effectors. To determine the global effects of driver gene mutations on protein expression, we systematically evaluated gene-protein associations for frequently mutated census genes (n > 3 cell lines) by comparing the abundance of each protein in cell lines with versus without a mutation, and plotted this information as a network. Driver genes and their protein targets formed clusters according to their shared associations (Figure 6C). The number of significant (p < 0.001) associations for each gene ranged from 11 to 320 (Figure 6D). The network degree distribution fit an exponential function (R2 = 0.99), revealing 233 hub proteins, each associated with three or more cancer census genes (Figure 6E). Cell cycle was the only significantly enriched gene ontology term among hub proteins (p = 5.66 x 10-4). While not surprising, it demonstrates that dysregulation of cell-cycle protein abundance may be a common effect of diverse genetic mutations.

On an individual basis, proteins regulated downstream of genetic lesions (e.g., TP53 loss of function) might represent more suitable therapeutic targets than the gene product itself. Several highly significant (p < 0.001) gene-protein associations are

A RELA(NF-kB subunit p65)


"К 3.00

£ 2.00

"g 1.00 N

1 0.00

Read-through (FSSVQLR)


.^Nm^oiomooiLommminiONiCN i

slllllj <" | о а о о а ^


J UJ °


CD47 (Integrin associated protein)

Isoform Isoform Isoform Isoform

£ 4.00 -i 3.00 -


2 2 90 FV-------------------------------

3 2 90 FVASNQKTIQPPRNN------------------


= 1.00 -



— —.........

s s < <

о о 2 2

S U- 1 â2 s

<| со u| a О О О О


m kO Q.

350 300 250 200 150 100 50 0


100 200 300 NFKB p65 (iBAQ)

600 -,

a < 500 -

TH 400 -

£ 300 -

и 100 -


100 200 300 400 500 600 700 CD47 isoform 3 (iBAQ)

PTK2 (Focal adhesion kinase 1)

long short

E 4.00 aj

Ù 3 00

I 2.00

.1 100 '■H

* 0.00

FERM Tyrosine kinase

Tyrosine kinase

Focal adhesion

300 -,

I 250 H

E 200 -■B

И 100



m «+ <т> r~i

moou-oLDroroiow., —

m m r> < со и Q

m со 3 x и

H ri И 1Л



s s з s


100 200 300 PTK2 short form (iBAQ)

Figure 5. Differential Expression of Protein Isoforms

(A) Schematic of RELA (NF-kB subunit p65) mRNA sequence variants and intensity-based quantification of the isoform-specific peptide FSSVQLR in each sample. Peptide intensity was divided by the total proteome intensity for normalization. The location of an exon read-through event is indicated.

(B) Scatterplot shows the full-length NF-kB protein versus the read-through variant, highlighting off-diagonal samples.

(C) Four alternative splice variants encode the cytoplasmic tail of integrin-associated protein CD47. The sequence of these variants is shown along with the quantification of the peptide specific to isoform 1, AVEEPLNAFK.

(D) Scatterplot shows CD47 isoform 1 versus isoform 3, highlighting off-diagonal samples.

(E) Schematic shows N-terminally truncated form of focal adhesion kinase PTK2 and quantification of N-terminal/C-terminal intensity in each sample.

(F) Scatterplot shows PTK2 long form versus short form, highlighting off-diagonal samples.

1.0 0.8 0.6 0.4 0.2 0.0

n = 56,579

I I I I I I I I I I I I I I 2 4 6 8 10 12 14

Copy Number

CN 4416748564 56

4000 -3000 -2000 -1000 -0 -

R = -0.06 i


Ol M ff) Ol *+■ m Ol _

LH 00 Ol

h h U rt


< < < < Q Q Q Q

4 -| S mutant ■ normal

-2 --4

2 -, 1.5 -1 -0.5 -

1500 -i

S mutant ■ normal

J2 500 -|


S mutant ■ normal

R2 = 0.9909

cell cycle (p =5.66x10-")

Node degree

S mutant ■ normal


a mutant

I normal


Figure 6. Proteogenomic Associations

(A) Boxplot shows the relationship of protein abundance to gene CN. Protein abundances were row-normalized to a scale of 0 to 1 to account for differences in absolute expression.

(B) NDRG1 (N-myc downstream regulated gene 1) is a representative protein that was not correlated with CN. CN > 6 highlighted in red. R represents Pearson's correlation. Error bars represent SD between replicate measurements.

(C) Network of gene-protein associations. Each edge represents an association (p < 0.001) between a mutated census gene (gray nodes) and protein expression (yellow nodes). Only genes from the COSMIC census mutated in at least three cell lines were analyzed. Node size represents the number of connections. The network was plotted in Cytoscape using edge-weighted spring-embedded layout so that genes with common associations cluster together.

(D) Number of outgoing associations for each mutated gene in network is shown.

(E) Number of incoming associations for each target protein in network (node degree distribution). Cell-cycle proteins were enriched among proteins with three or more associated genes (p = 5.66 x 10~4).

(F-J) Representative gene-protein associations (p < 0.001) for common genetic lesions in breast cancer. Protein is indicated in chart title, and mutated gene is shown in italics below plot. Error bars represent SEM.

shown (Figures 6F-6J). In the case of TP53, nearly all of the significantly associated proteins were involved in DNA metabolism and repair. One such protein was ecto-5'-nucleotidase (NT5E or CD73), a GPI-anchored cell surface enzyme involved in the production of membrane-permeable nucleosides, which can be used for nucleotide salvage (Zimmermann, 1992). Targeting it by small interfering RNA(siRNA) or small molecule inhibition (using adenosine [(a,b)-methylene] diphosphate) arrested the cell cycle and triggered apoptosis in MDA-MB-231 breast cancer cells (Zhi et al., 2010). Monoclonal antibodies against NT5E also were demonstrated to block breast cancer metastasis in vivo (Stagg et al., 2010). NT5E may be an effective drug target specifically for cancers with TP53 mutations. In addition to the discovery of potential drug targets, these proteins also could be used as markers to infer whether or not a mutation is deleterious.

Proteomics of Drug Sensitivity

To generate a resource for drug sensitivity prediction, we screened the 16 TNBC cell lines from our panel against a library of 160 compounds at eight different concentrations spanning four orders of magnitude. We used this data to determine the IC50, defined as the dose required to reach a 50% reduction in cell viability, for each drug in each cell line (Table S2). Approximately three quarters (123/160) of the compounds elicited a measurable response in at least one cell line, and each cell line was sensitive to at least five compounds at sub-micromolar doses. The distribution of responses for each drug was diverse (Figure 7A). The IC50 distribution for most drugs spanned a wide range, 790-fold on average. Some drugs were very specific with few sensitive cell lines (e.g., everolimus, methotrexate, and lapatinib), while other drugs were indiscriminate with few resistant cell lines (e.g., bortezomib, paclitaxel, and MG132).

Next, we combined our pharmacological dataset with publicly accessible data from the Genomics of Drug Sensitivity in Cancer (CRx) resource (Yang et al., 2013) and performed regression analysis against mass spectrometry-derived protein abundances to discover proteomic markers of drug sensitivity or resistance. We used hierarchical clustering to analyze global patterns among drug sensitivity-protein expression relationships, revealing many distinct clusters (Figure 7B). Drugs targeting proteins in the same pathway (e.g., BRAF and MEK inhibitors) showed similar correlation profiles. Interestingly, proteins that were part of the same pathways or complexes also clustered together, which did not occur using protein expression data alone (Figure 3A). The cluster that was highly enriched with mito-chondrial proteins was associated with sensitivity to drugs that might depend on mitochondrial protein expression (belinostat, vorinostat, and obatoclax). For example, since protein acetyla-tion is known to be enriched within the mitochondrial space, cells with more mitochondria might be more sensitive to deacetylase inhibition. In a similar vein, the cluster that was enriched with translation factors was associated with increased sensitivity to proteasome inhibitors MG132 and bortezomib. These results show that the integration of proteomics and drug sensitivity data using regression analysis provides a rich resource to identify unexpected modes of action and to discover new features of target pathways.

We used the regression analysis to select the most effective and robust drugs for known targets. For example, EGFR expression was, as expected, strongly associated with sensitivity to the EGFR inhibitor lapatinib in both drug screens (our data: R = 0.96, p = 2.36 x 10 9; CRx: R = 0.99, p = 6.2 x 10 4; Figure 7C). Pro-teomics data also can be used to uncover mechanisms of drug sensitivity. For example, several cell lines were hypersensitive to the drug bleomycin, an antibiotic used to treat plantar warts as well as many forms of cancer by inducing DNA damage. Expression of DDX60, an antiviral RNA/DNA helicase that binds cytosolic DNA (Miyashita et al., 2011), was most significantly associated with sensitivity to bleomycin (R = 0.99, p = 1.1 x 10-15) (Figure 7D).

We curated these drug sensitivity results to ask whether drug sensitivity associated with (1) genetic mutations or protein expression of the drug target itself, (2) proteins in the same pathway as the target, or (3) other literature-supported synthetic lethal interactions. Drug sensitivity associated strongly with both genomic and proteomic features of known targets. For example, we found that sensitivity to all-trans retinoic acid (ATRA) was correlated with the expression of its target protein RXRB (R = 0.98, p = 7.91 x 10-9). HCC1806 cells, which expressed the highest level of RXRB, were >200-fold more sensitive than the median cell line (Figure 7E). The cell line DU4475, which harbors the hyperactive BRAF-V600E mutation, was hypersensitive to both BRAF and MEK inhibitors (6,000-fold and 100,000-fold versus median, respectively) despite similar expression of the target proteins.

Another potential mechanism of drug sensitivity is synthetic lethality, in which the right combination of genetic, proteomic, or pharmacologic perturbations leads to cell death. Synthetic lethality tends to occur between proteins in the same pathway. For example, the AKT1/2 inhibitor MK-2206 was not associated with expression of AKT isoforms, but was significantly associated with expression of RPS6KB2 (R = 0.84, p = 3.54 x 10-4) (Figure 7F), which lies downstream in the signaling pathway (Shaw and Cantley, 2006). Other drugs correlated with proteins that are not known to be in the same pathway, but have been previously proposed to be synthetic lethal relationships in genetic datasets. For example, poly-ADP ribose polymerase (PARP) inhibition disrupts DNA repair leading to genotoxic stress and cellular senescence, a process shown to be accelerated in overactive AKT-signaling mutants (Chatterjee et al., 2013; Mendes-Pereira et al., 2009). In ourdata, AKT protein expression was also significantly correlated with sensitivity to PARP inhibition using AG-014699 (R = 0.74, p = 0.0014) (Figure 7G).

We explored how the differences in drug sensitivity and target expression between members of a signaling pathway relate to pathway structure. In the Akt-mTOR-S6K-signaling pathway, ri-bosomal protein S6 kinases (RPS6KB1/2) are activated by mTOR. Curiously, despite its association with MK-2206 sensitivity, expression of either RPS6KB1 or RPS6KB2 was inversely correlated with the S6K inhibitor PF-4708671 in luminal breast cancer cells (R = -0.96, p = 0.04) (Figure S5A). This is consistent with the suggestion that S6K inhibition may amplify upstream cancer signaling due to the chronic ablation of a negative feedback loop (Carracedo et al., 2008; Manning, 2004). Thus, the tumorigenic action of this protein may be best targeted indirectly

r --- - ;•-! "I

I J" ^ Mismatch rppair

J _1 Pad|.= 5.90X10"3

Resistance Sensitivity

Correlation (iBAQ vs IC50"1)

Resistance Sensitivity

Correlation (iBAQvs IC501)

Oxidative phosphorylation

RTK signaling pathways



Integrin signaling : 2.30X10"8

Translation factor : 7.01x10"®

Direct target

ATRA (RXR agonist)


BT474 1


MFM223 1




HCC1937 !


HS578T H

MDAMB231 1


HCC1599 1

R = 0.98, P = 7.91xl0-9


Pathway targeted


(AKT1/2 inhibitor)

R = 0.84, P= 3.54X10"4

Synthetic lethal

AG-014699 (PARP1/2 inhibitor)

R = 0.74, P= 0.0014

Figure 7. Protein Expression and Drug Sensitivity

(A) Distribution of drug sensitivity (—log10IC50) values across 16TNBC cell lines for each drug in order of increasing median sensitivity. Drugs with sub-micromolar IC50 in at least one cell line are shown. Gray dots represent outlier values (>1.5x interquartile range).

(B) Hierarchical clustering of drug-protein associations. Pairwise Pearson's correlation was calculated systematically between drug sensitivity (inverted IC50) and protein abundance (iBAQ) values and clustered in both dimensions. Enriched gene ontology terms are shown for several clusters with Benjamini-Hochberg adjusted p value.

(C) Association of drug sensitivity with EGFR expression. The EGFR inhibitor lapatinib was significantly associated in both drug screen datasets (CRx: p = 6.2 x 10—4, our data: p = 2.4 x 10—9, FDR < 0.05).

(legend continued on next page)

(Figure S5B). Unlike RPS6KB2, RPS6KB1 expression did not correlate with AKT1/2 inhibitor MK-2206 sensitivity but instead was most highly correlated with the p21-activated kinase (PAK) inhibitor IPA-3 (R = 0.99, p = 1.91 x 10-12). Based on images from the Human Protein Atlas, RPS6KB1 and PAK2 are localized to the nucleus whereas RPS6KB2 and PAK1 are cytoplasmic (Uhlen et al., 2010). Thus, the reported activation of PAK1 downstream of S6K(Ishida et al., 2007) might be localized and isoform specific. Together, these results demonstrate that integrated analysis of drug sensitivity and protein expression provides a useful strategy for selecting drugs, finding diagnostic markers, and identifying potential mechanisms of cellular signaling. Further experimentation will be required to confirm these findings.

Finally, to demonstrate the potential clinical utility of these results, we asked how many proteins from the drug association analysis could be identified in primary tumors. We found that 73% (6,798/9,292) were quantifiable in the four clinical specimens we analyzed (Figure S5C). Of these, 494 were at least 5-fold more abundant than the average sample in at least one tumor. For example, the abundance of the protein kinase AKT2 was higher in one of the tumor samples than in any cell line analyzed in this study (Figure S5D).


Despite the success of large-scale "-omic" studies in providing molecular targets for therapeutic intervention, these studies have been limited by the lack of comprehensive protein data. Mass spectrometry-based proteomics has advanced rapidly, and it has become routine to reproducibly quantify near-complete proteomes using this technology. Here we used mass spectrometry to interrogate the proteomes of TNBC. We then integrated proteomics, genomics, and drug sensitivity data to study the effects of genomic aberrations in the proteome and build predictive models of drug response using proteomics.

This dataset is a useful resource to further explore the biology of TNBC. For example, many of the recently described metastatic stem cell pathways were highly expressed at the protein level in TNBC compared to luminal breast cells. The most invasive TNBC cells and solid tumors expressed low levels of proteins involved in cell proliferation and high levels of proteins involved in the epithelial-to-mesenchymal transition. Thus, the highly specialized nature of metastatic TNBC cells may be one reason they are so difficult to treat using conventional cytotoxic agents that target highly proliferative cells. Precise knowledge of the proteomes of these cells can guide the development of new drugs to target the metastatic transition.

Machine learning has become a useful tool to capture the molecular features responsible for differences in drug sensitivity (Barretina et al., 2012; Costello et al., 2014; Weinstein et al., 1997; Yang et al., 2013). Statistically significant differences in drug sensitivity based on cellular subtype have been observed (Lehmann et al., 2011), but the effect sizes are small compared to treatment strategies directed toward precise molecular insults. Examples include ERBB2 amplification (trastuzumab), BCR-ABL fusion (imatinib), or BRAF-V600E mutation (vemurafe-nib), all of which result in orders-of-magnitude increases in drug sensitivity. In reality, large-effect sizes are needed to make an impact in the clinic. In this study, drug sensitivity and the expression of cancer-related proteins were not generally attributable to subtypes derived by clustering global protein profiles. Considering these cells were all derived from the same tissue type (breast) and were cultured in the same conditions, the dynamic range and specificity of protein expression for established regulatory proteins and drug targets was surprising. Using regression and prior knowledge to interrogate mechanisms of protein expression in drug sensitivity, we found that, in many cases, drug sensitivity was strongly correlated with the expression of the drug target itself (e.g., retinoic acid receptors, EGFR) or proteins in the same biological pathway (e.g., S6K expression as a marker for sensitivity to AKT inhibitors).

With the exception of drugs targeting amplified genes, the importance of protein expression in drug efficacy might be underestimated. While it is evident that the target of a drug must be expressed at some level in order for the drug to take effect, many drugs are developed with the assumption that the target is expressed at similar levels in all cells. Even in the case of gene amplification, CN does not fully account for differences in protein expression among specimens. In any case, quantitative analysis of drug targets and genetic abnormalities at the protein level might represent a useful addition to the current adjuvant therapy selection algorithm. Indeed, this is already routine for estrogen, progesterone, and epidermal growth factor receptor-2. Larger panels of cell lines will be necessary to capture rare genetic events and to enable more robust machine-learning approaches. This will facilitate the discovery of less obvious markers of drug sensitivity, such as synthetic lethal interactions. Proteomics also could provide an indispensable tool to rescue clinical trial results that do not improve patient outcomes in aggregate, but have many exceptional responses that might be due to underlying molecular features.

This study builds on other deep proteomic characterizations of cancer (Geiger et al., 2012b; Moghaddas Gholami et al., 2013; Nagaraj et al., 2011; Zhang et al., 2014) and represents the first deep proteome characterization targeting TNBC. With the

(D) Association of protein expression with bleomycin sensitivity. The protein DDX60 was significantly associated with bleomycin sensitivity (p = 1.1 x 10 15, FDR < 0.05).

(E-G) Pairwise comparisons of protein expression and drug sensitivity for three examples are shown. (E) Direct target, expression of the target protein indicates sensitivity to the drug; (F) pathway target, expression of a protein in the pathway of the drug target, but not the target itself, indicates sensitivity; (G) synthetic lethal, expression of a protein in an independent pathway from the drug target indicates sensitivity; (left) protein abundance (iBAQ) across cell lines; (right) drug sensitivity (inverse IC50, M-1) across the same cell lines. RXRB, retinoid X receptor beta; RPS6KB2, ribosomal protein S6 kinase-2; AKT1, RAC-alpha serine/ threonine-protein kinase; ATRA, RXR agonist all-trans retinoic acid; MK-2206, pan-isoform AKT inhibitor; AG-014699, poly-ADP ribose polymerase 1/2 inhibitor. Pearson's correlation and p value are indicated below the plots. CRx data from Yang et al., (2013). (A) includes only data generated in this study. For (B-G), data from the CRx were included. Missing IC50 values were not imputed.

development of large -omics approaches, personalized, predictive medicine is the prevailing direction of next-generation healthcare technology (Tian et al., 2012). Systematic, data-driven approaches are necessary to meet this goal. We anticipate that genome-scale nucleic acid sequencing and protein analysis will provide the basic molecular diagnostics toolbox for precision cancer medicine. TNBC is one of many unmet clinical needs that will benefit from future research in this area.


Samples were lysed in denaturing buffer and centrifuged at 12,000 x g for 10 min to pellet insoluble material. Protein extracts were reduced with 5 mM DTT at 55°C and alkylated with 15 mM iodoacetamide at room temperature in the dark. Extracts from each sample (25 mg) were diluted and digested in solution overnight with either LysC (Wako Pure Chemicals Industries) or sequencing-grade trypsin (Promega). Peptides were desalted and fractionated on StageTips (Rappsilber et al., 2007) by basic reverse-phase using a stepwise gradient of increasing acetonitrile (5%, 10%, 15%, 25%, and 80%) in 0.1% NH4OH. The resulting fractions were analyzed by LC-MS/MS.


Peptide fractions were analyzed on an EASY-nLC-1000 (Thermo Scientific) coupled to a hybrid quadrupole-orbitrap Q-Exactive mass spectrometer (Thermo Scientific) configured for data-dependent acquisition. Raw mass spectra were searched using Sequest (release 2012.01.0 of UW Se-quest) against a concatenated forward and reverse version of the Uniprot human protein sequence database (v11/29/2012). Peptide spectral matches for all fractions corresponding to the same sample were filtered to reach a protein identification FDR of less than 1%, resulting in an aggregate peptide-level FDR of less than 0.1% for the entire dataset. Protein quantifications were calculated using the iBAQ approach (Schwan-hausser et al., 2011).

Drug Screen and Curve Fitting

Compounds were added to cells using the CyBi-Well Vario Workstation (CyBio) and incubated at 37°C, 5% CO2 for 96 hr. Cell viability was measured by luminescence using quantitation of ATP as an indicator of metabolically active cells. Measurements were corrected for background luminescence and percentage cell viability was reported as relative tothe DMSO solvent control. Non-linear curve fitting was performed using MATLAB's nlinfit function (MathWorks). External drug sensitivity data (IC50) was downloaded from the Genomics of Drug Sensitivity in Cancer resource (Yang et al., 2013), release 2.0 (

Statistical Analysis

Significance tests and correlation analysis were performed using built-in functions of Microsoft Office Excel 2013 or R statistical computing environment version 3.1.0. Gene enrichment significance testing was performed in DAVID version 6.7 using the EASE metric, a modified Fisher's exact test (Huang et al., 2009). All error bars represent SD unless otherwise noted.


The raw mass spectrometry files associated with this work were deposited to the ProteomicsDB and are available under accession number PRDB004167.


Supplemental Information includes Supplemental Experimental Procedures, five figures, and three tables and can be found with this article online at


R.T.L. and J.V. designed the research. R.T.L., E.M.P., and K.M.H. performed proteomics experiments under J.V.'s supervision. C.P.M. performed drug sensitivity assays under C.A.B.'s supervision. H.Y.I. provided reagents. R.T.L. performed proteomics data analysis and integrative analysis. S.-I.L. analyzed drug sensitivity data and supervised statistical analysis. D.H. developed the web-based resource. R.T.L. and J.V. wrote the paper, and all authors edited it.


We thank Elisabeth Mahen and Chaozhong Song for excellent technical support, and members of the J.V. laboratory as well as Elizabeth O'Day for critical reading of the manuscript. This work was supported by a Howard Temin Pathway to Independence Award K99/R00 from NIH/NCI (R00CA140789) to J.V.; an Interdisciplinary Training in Genome Sciences grant from NIH/NHGRI (T32 HG00035 to R.T.L.); a National Science Foundation grant (DBI-1355899) to S.-I.L.; and funds from the South Sound CARE Foundation, the Washington Research Foundation, and the Gary E. Milgard Family Foundation to C.A.B.

Received: December 11, 2014 Revised: March 6, 2015 Accepted: March 23, 2015 Published: April 16, 2015


Arteaga, C.L., and Baselga, J. (2012). Impact of genomics on personalized cancer medicine. Clin. Cancer Res. 18, 612-618.

Banerji, S., Cibulskis, K., Rangel-Escareno, C., Brown, K.K., Carter, S.L., Frederick, A.M., Lawrence, M.S., Sivachenko, A.Y., Sougnez, C., Zou, L., et al. (2012). Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486, 405-409.

Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A.A., Kim, S., Wilson, C.J., Lehar, J., Kryukov, G.V., Sonkin, D., et al. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603-607.

Beck, M., Schmidt, A., Malmstroem, J., Claassen, M., Ori, A., Szymborska, A., Herzog, F., Rinner, O., Ellenberg, J., and Aebersold, R. (2011). The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549.

Brown, E.J., and Frazier, W.A. (2001). Integrin-associated protein (CD47) and its ligands. Trends Cell Biol. 11, 130-135.

Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature 490, 61-70.

Carracedo, A., Ma, L., Teruya-Feldstein, J., Rojo, F., Salmena, L., Alimonti, A., Egia, A., Sasaki, A.T., Thomas, G., Kozma, S.C., et al. (2008). Inhibition of mTORC1 leads to MAPK pathway activation through a PI3K-dependent feedback loop in human cancer. J. Clin. Invest. 118, 3065-3074. Chatterjee, P., Choudhary, G.S., Sharma, A., Singh, K., Heston, W.D., Ciezki, J., Klein, E.A., and Almasan, A. (2013). PARP inhibition sensitizes to low doserate radiation TMPRSS2-ERG fusion gene-expressing and PTEN-deficient prostate cancer cells. PLoS ONE 8, e60408.

Costello, J.C., Heiser, L.M., Georgii, E., Gonen, M., Menden, M.P., Wang, N.J., Bansal, M., Ammad-ud-din, M., Hintsanen, P., Khan, S.A., et al.; NCI DREAM Community (2014). A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1202-1212. Curtis, C., Shah, S.P., Chin, S.-F., Turashvili, G., Rueda, O.M., Dunning, M.J., Speed, D., Lynch, A.G., Samarajiwa, S., Yuan, Y., et al.; METABRIC Group (2012). The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346-352.

Forbes, S.A., Bindal, N., Bamford, S., Cole, C., Kok, C.Y., Beare, D., Jia, M., Shepherd, R., Leung, K., Menzies, A., et al. (2011). COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945-D950.

Frame, M.C., Patel, H., Serrels, B., Lietha, D., and Eck, M.J. (2010). The FERM domain: organizing the structure and function of FAK. Nat. Rev. Mol. Cell Biol. 11, 802-814.

Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N., and Stratton, M.R. (2004). A census of human cancer genes. Nat. Rev. Cancer 4, 177-183.

Geiger, T., Madden, S.F., Gallagher, W.M., Cox, J., and Mann, M. (2012a). Pro-teomic portrait of human breast cancer progression identifies novel prognostic markers. Cancer Res. 72, 2428-2439.

Geiger, T., Wehner, A., Schaab, C., Cox, J., and Mann, M. (2012b). Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol. Cell. Proteomics 11, 014050. Gygi, S.P., Rochon, Y., Franza, B.R., and Aebersold, R. (1999). Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720-1730. Howlader, N., Altekruse, S.F., Li, C.I., Chen, V.W., Clarke, C.A., Ries, L.A.G., and Cronin, K.A. (2014). US incidence of breast cancer subtypes defined by joint hormone receptor and HER2 status. J. Natl. Cancer Inst. 106, dju055. Huang, W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44-57.

Huber, M.A., Azoitei, N., Baumann, B., Grünert, S., Sommer, A., Pehamberger, H., Kraut, N., Beug, H., and Wirth, T. (2004). NF-kappaB is essential for epithe-lial-mesenchymal transition and metastasis in a model of breast cancer progression. J. Clin. Invest. 114, 569-581.

Hudis, C.A., and Gianni, L. (2011). Triple-negative breast cancer: an unmet medical need. Oncologist 16 (1), 1-11.

Ishida, H., Li, K., Yi, M., and Lemon, S.M. (2007). p21-activated kinase 1 is activated through the mammalian target of rapamycin/p70 S6 kinase pathway and regulates the replication of hepatitis C virus in human hepatoma cells. J. Biol. Chem. 282, 11836-11848.

Kennedy, J.J., Abbatiello, S.E., Kim, K., Yan, P., Whiteaker, J.R., Lin, C., Kim, J.S., Zhang, Y., Wang, X., Ivey, R.G., et al. (2014). Demonstrating the feasibility of large-scale development of standardized assays to quantify human proteins. Nat. Methods 11, 149-155.

Kenny, P.A., Lee, G.Y., Myers, C.A., Neve, R.M., Semeiks, J.R., Spellman, P.T., Lorenz, K., Lee, E.H., Barcellos-Hoff, M.H., Petersen, O.W., et al. (2007). The morphologies of breast cancer cell lines in three-dimensional assays correlate with their profiles of gene expression. Mol. Oncol. 1, 84-96. Kim, M.-S., Pinto, S.M., Getnet, D., Nirujogi, R.S., Manda, S.S., Chaerkady, R., Madugundu, A.K., Kelkar, D.S., Isserlin, R., Jain, S., et al. (2014). Adraft map of the human proteome. Nature 509, 575-581.

Lehmann, B.D., Bauer, J.A., Chen, X., Sanders, M.E., Chakravarthy, A.B., Shyr, Y., and Pietenpol, J.A. (2011). Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J. Clin. Invest. 121, 2750-2767.

Lin, X., Duan, X., Liang, Y.-Y., Su, Y., Wrighton, K.H., Long, J., Hu, M., Davis,

C.M., Wang, J., Brunicardi, F.C., et al. (2006). PPM1A functions as a Smad

phosphatase to terminate TGFbeta signaling. Cell 125, 915-928.

Luo, J.-L., Maeda, S., Hsu, L.-C., Yagita, H., and Karin, M. (2004). Inhibition of

NF-kappaB in cancer cells converts inflammation- induced tumor growth

mediated by TNFalpha to TRAIL-mediated tumor regression. Cancer Cell 6,


Maier, T., Güell, M., and Serrano, L. (2009). Correlation of mRNA and protein in complex biological samples. FEBS Lett. 583, 3966-3973. Manning, B.D. (2004). Balancing Akt with S6K: implications for both metabolic diseases and tumorigenesis. J. Cell Biol. 167, 399-403. McLean, G.W., Carragher, N.O., Avizienyte, E., Evans, J., Brunton, V.G., and Frame, M.C. (2005). The role of focal-adhesion kinase in cancer - a new therapeutic opportunity. Nat. Rev. Cancer 5, 505-515.

Mendes-Pereira, A.M., Martin, S.A., Brough, R., McCarthy, A., Taylor, J.R., Kim, J.-S., Waldman, T., Lord, C.J., and Ashworth, A. (2009). Synthetic lethal targeting of PTEN mutant cells with PARP inhibitors. EMBO Mol. Med. 1, 315-322.

Miyashita, M., Oshiumi, H., Matsumoto, M., and Seya, T. (2011). DDX60, a DEXD/H box helicase, is a novel antiviral factor promoting RIG-I-like receptor-mediated signaling. Mol. Cell. Biol. 31, 3802-3819.

Moghaddas Gholami, A., Hahne, H., Wu, Z., Auer, F.J., Meng, C., Wilhelm, M., and Kuster, B. (2013). Global proteome analysis of the NCI-60 cell line panel. Cell Rep. 4, 609-620.

Nagaraj, N., Wisniewski, J.R., Geiger, T., Cox, J., Kircher, M., Kelso, J., Paabo, S., and Mann, M. (2011). Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 7, 548.

Neve, R.M., Chin, K., Fridlyand, J., Yeh, J., Baehner, F.L., Fevr, T., Clark, L., Bayani, N., Coppe, J.-P., Tong, F., et al. (2006). A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515-527.

Perou, C.M., S0rlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.R., Ross, D.T., Johnsen, H., Akslen, L.A., et al. (2000). Molecular portraits of human breast tumours. Nature 406, 747-752.

Prat, A., and Perou, C.M. (2011). Deconstructing the molecular portraits of breast cancer. Mol. Oncol. 5, 5-23.

Prat, A., Parker, J.S., Karginova, O., Fan, C., Livasy, C., Herschkowitz, J.I., He, X., and Perou, C.M. (2010). Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 12, R68.

Quail, D.F., and Joyce, J.A. (2013). Microenvironmental regulation of tumor progression and metastasis. Nat. Med. 19, 1423-1437.

Rappsilber, J., Mann, M., and Ishihama, Y. (2007). Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2, 1896-1906.

Sakurai, H., Chiba, H., Miyoshi, H., Sugita, T., and Toriumi, W. (1999). IkappaB kinases phosphorylate NF-kappaB p65 subunit on serine 536 in the transacti-vation domain. J. Biol. Chem. 274, 30353-30356.

Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W., and Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature 473, 337-342.

Shaw, R.J., and Cantley, L.C. (2006). Ras, PI(3)Kand mTOR signalling controls tumour cell growth. Nature 441, 424-430.

Sick, E., Jeanne, A., Schneider, C., Dedieu, S., Takeda, K., and Martiny, L. (2012). CD47 update: a multifaceted actor in the tumour microenvironment of potential therapeutic interest. Br. J. Pharmacol. 167, 1415-1430.

Stagg, J., Divisekera, U., McLaughlin, N., Sharkey, J., Pommey, S., Denoyer, D., Dwyer, K.M., and Smyth, M.J. (2010). Anti-CD73 antibody therapy inhibits breast tumor growth and metastasis. Proc. Natl. Acad. Sci. USA 107, 15471552.

Subik, K., Lee, J.-F., Baxter, L., Strzepek, T., Costello, D., Crowley, P., Xing, L., Hung, M.-C., Bonfiglio, T., Hicks, D.G., and Tang, P. (2010). The Expression Patterns of ER, PR, HER2, CK5/6, EGFR, Ki-67 and AR by Immunohistochem-ical Analysis in Breast Cancer Cell Lines. Breast Cancer (Auckl) 4, 35-41.

Tian, Q., Price, N.D., and Hood, L. (2012). Systems cancer medicine: towards realization of predictive, preventive, personalized and participatory (P4) medicine. J. Intern. Med. 271, 111-121.

Tibes, R., Qiu, Y., Lu, Y., Hennessy, B., Andreeff, M., Mills, G.B., and Kornblau, S.M. (2006). Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol. Cancer Ther. 5, 2512-2521.

Uhlen, M., Oksvold, P., Fagerberg, L., Lundberg, E., Jonasson, K., Forsberg, M., Zwahlen, M., Kampf, C., Wester, K., Hober, S., et al. (2010). Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248-1250.

Untch, M., Konecny, G.E., Paepke, S., and von Minckwitz, G. (2014). Current and future role of neoadjuvant therapy for breast cancer. Breast 23, 526-537.

Vidal, M., Chan, D.W., Gerstein, M., Mann, M., Omenn, G.S., Tagle, D., and Sechi, S.; Workshop Participants (2012). The human proteome - a scientific opportunity for transforming diagnostics, therapeutics, and healthcare. Clin. Proteomics 9, 6.

Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz, L.A., Jr., and Kinzler, K.W. (2013). Cancer genome landscapes. Science 339, 15461558.

Vranic, S., Gatalica, Z., and Wang, Z.-Y. (2011). Update on the molecular profile of the MDA-MB-453 cell line as a model for apocrine breast carcinoma studies. Oncol Lett 2, 1131-1137.

Weinstein, J.N., Myers, T.G., O'Connor, P.M., Friend, S.H., Fornace, A.J., Jr., Kohn, K.W., Fojo, T., Bates, S.E., Rubinstein, L.V., Anderson, N.L., etal. (1997). An information-intensive approach to the molecular pharmacology of cancer. Science 275, 343-349.

Wilhelm, M., Schlegl, J., Hahne, H., Moghaddas Gholami, A., Lieberenz, M., Savitski, M.M., Ziegler, E., Butzmann, L., Gessulat, S., Marx, H., et al. (2014). Mass-spectrometry-based draft of the human proteome. Nature 509, 582-587.

Yang, W., Soares, J., Greninger, P., Edelman, E.J., Lightfoot, H., Forbes, S., Bindal, N., Beare, D., Smith, J.A., Thompson, I.R., et al. (2013). Genomics of

Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955-D961.

Yuan, Y., Van Allen, E.M., Omberg, L., Wagle, N., Amin-Mansour, A., Sokolov, A., Byers, L.A., Xu, Y., Hess, K.R., Diao, L., et al. (2014). Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Bio-technol. 32, 644-652.

Zhang, B., Wang, J., Wang, X., Zhu, J., Liu, Q., Shi, Z., Chambers, M.C., Zimmerman, L.J., Shaddox, K.F., Kim, S., et al.; NCI CPTAC (2014). Proteoge-nomic characterization of human colon and rectal cancer. Nature 513, 382-387.

Zhi, X., Wang, Y., Zhou, X., Yu, J., Jian, R., Tang, S., Yin, L., and Zhou, P. (2010). RNAi-mediated CD73 suppression induces apoptosis and cell-cycle arrest in human breast cancer cells. Cancer Sci. 101, 2561-2569.

Zimmermann, H. (1992). 5'-Nucleotidase: molecular structure and functional aspects. Biochem. J. 285, 345-365.