Scholarly article on topic 'Transcriptional Analyses of Barrett's Metaplasia and Normal Upper GI Mucosae'

Transcriptional Analyses of Barrett's Metaplasia and Normal Upper GI Mucosae Academic research paper on "Biological sciences"

Share paper
Academic journal
OECD Field of science
{"Barrett's esophagus" / microarray / clustering / expression / premalignant}

Abstract of research paper on Biological sciences, author of scientific article — Michael T. Barrett, Ka Yee Yeung, Walter L. Ruzzo, Li Hsu, Patricia L. Blount, et al.

Abstract Over the last two decades, the incidence of esophageal adenocarcinoma (EA) has increased dramatically in the US and Western Europe. It has been shown that EAs evolve from premalignant Barrett's esophagus (BE) tissue by a process of clonal expansion and evolution. However, the molecular phenotype of the premalignant metaplasia, and its relationship to those of the normal upper gastrointestinal (GI) mucosae, including gastric, duodenal, and squamous epithelium of the esophagus, has not been systematically characterized. Therefore, we used oligonucleotide-based microarrays to characterize gene expression profiles in each of these tissues. The similarity of BE to each of the normal tissues was compared using a series of computational approaches. Our analyses included esophageal squamous epithelium, which is present at the same anatomic site and exposed to similar conditions as Barrett's epithelium, duodenum that shares morphologic similarity to Barrett's epithelium, and adjacent gastric epithelium. There was a clear distinction among the expression profiles of gastric, duodenal, and squamous epithelium whereas the BE profiles showed considerable overlap with normal tissues. Furthermore, we identified clusters of genes that are specific to each of the tissues, to the Barrett's metaplastic epithelia, and a cluster of genes that was distinct between squamous and nonsquamous epithelia.

Academic research paper on topic "Transcriptional Analyses of Barrett's Metaplasia and Normal Upper GI Mucosae"


Neoplasia . Vol. 4, No. 2, 2002, pp. 121-128

Transcriptional Analyses of Barrett's Metaplasia and Normal Upper GI Mucosae

Michael T. Barrett*, Ka Yee Yeungy, Walter L. Ruzzoy, Li Hsuz, Patricia L. Blountz, Robert Sullivan*, Helmut Zarbl*z, Jeffrey DelrowPeter S. Rabinovitch{ and Brian J. Reid*z #**

*Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle WA, USA; yDepartment of Computer Science, University of Washington, Seattle WA, USA; Divisions of zPublic Health Sciences, xDNA Array Facility, Fred Hutchinson Cancer Research Center, Seattle, WA, USA; Departments of {Pathology, #Medicine (Gastroenterology Division), **Genetics, University of Washington, Seattle, WA, USA


Over the last two decades, the incidence of esophageal adenocarcinoma (EA) has increased dramatically in the US and Western Europe. It has been shown that EAs evolve from premalignant Barrett's esophagus (BE) tissue by a process of clonal expansion and evolution. However, the molecular phenotype of the premalignant metaplasia, and its relationship to those of the normal upper gastrointestinal (GI) mucosae, including gastric, duodenal, and squamous epithelium of the esophagus, has not been systematically characterized. Therefore, we used oligonucleotide-based microarrays to characterize gene expression profiles in each of these tissues. The similarity of BE to each of the normal tissues was compared using a series of computational approaches. Our analyses included esophageal squamous epithelium, which is present at the same anatomic site and exposed to similar conditions as Barrett's epithelium, duodenum that shares morphologic similarity to Barrett's epithelium, and adjacent gastric epithelium. There was a clear distinction among the expression profiles of gastric, duodenal, and squamous epithelium whereas the BE profiles showed considerable overlap with normal tissues. Furthermore, we identified clusters of genes that are specific to each of the tissues, to the Barrett's metaplastic epithelia, and a cluster of genes that was distinct between squamous and non-squamous epithelia.

Neoplasia (2002) 4,121-128 DOI: 10.1038/sj/neo/7900221

Keywords: Barrett's esophagus, microarray, clustering, expression, premalignant.


Barrett's esophagus ( BE) is a condition in which the stratified squamous epithelium of the esophagus is replaced by metaplastic columnar epithelium. Barrett's metaplasia develops as a complication in approximately 10% of persons with chronic gastroesophageal reflux disease (GERD) and predisposes to the development of esophageal adenocarci-noma (EA). The development of Barrett's metaplasia is fundamentally related to tissue differentiation. The phenotype of Barrett's metaplasia has been described by histologic,

electron microscopic, immunohistochemical, and biochemical studies, and the results show a surprisingly complex epithelium that shares features with duodenal, gastric, and squamous esophageal epithelia. By electron microscopy, Barrett's metaplasia resembles small intestine with goblet cells and intervening "pseudoabsorptive" cells that have a variably developed brush border [1,2]. Biochemical studies have confirmed that Barrett's metaplasia expresses villin, sucrase isomaltase, and hydrolase aminopeptidase, which are also found in small intestine, but not esophageal squamous epithelium [3-5]. Barrett's metaplasia also has some features in common with gastric mucosa, including mucus secretory capacity and mucus granules [1]. However, Barrett's metaplasia also shares some features with squ-amous esophageal cells, including expression of both squamous and columnar cytokeratins [6]. Further, the squamocolumnar junction in persons with BE can have a unique multilayered epithelium with features of both squ-amous and columnar cells, including cytokeratin staining [7].

Other phenotypic aspects of Barrett's metaplasia include cellular hyperproliferation that has been confirmed by a number of methods, including immunohistochemistry, flow cytometry, and bromodeoxyuridine (Brdu) and tritiated thymidine labeling techniques. In addition, Barrett's metaplasia typically arises in the setting of chronic esophageal reflux disease with erosive esophagitis and denuded regions of squamous epithelium. Finally, there is evidence that the metaplastic epithelium can undergo extensive clonal expansion to occupy large regions of esophageal mucosa [8,9].

Recent microarray studies have shown that cancers, although highly variable, can be categorized into different classes based on the presence of distinctive expression signatures (reviewed in Ref. [10]). However, little is known about the molecular phenotype of human metaplasia in vivo. The ability to sample Barrett's epithelium and the surrounding

Abbreviations: BE, Barrett's esophagus; DUO, duodenum; EA, esophageal adenocarcinoma; GAS, gastric; GI, gastrointestinal; FOM, figure of merit; SQ, squamous Address all correspondence to: Dr. Brian J. Reid, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave North, Mail Stop C1 -157, Seattle, WA 98109, USA. E-mail:

Received 22 August 2001; Accepted 14 September 2001.

Copyright © 2002 Nature Publishing Group All rights reserved 1522-8002/02 $25.00

normal tissues provides a unique in vivo human model to use microarray technology to compare a premalignant metaplastic tissue with the surrounding normal upper gastrointestinal (GI) tissues, including squamous, gastric, and duodenal epithelia.

Materials and Methods

Tissue Collection

Endoscopic biopsies (four to six biopsies per patient) from each tissue, esophageal squamous, gastric, duodenum, and Barrett's epithelia, were collected from a series of patients during endoscopic surveillance in the Seattle Barrett's Esophagus Study. The Seattle Barrett's Esophagus Study was approved by the Human Subjects Division of the University of Washington in 1983 and renewed annually thereafter with reciprocity from the IRB of the Fred Hutchinson Cancer Research Center since 1993. Samples were immediately placed in RNAlater (Ambion, Woodlands, TX) then stored at 4°C for up to 1 week or at -20°C for longer periods of time until processing.

RNA Extraction and cDNA Preparation

Endoscopic biopsies of each tissue were pooled (two to four patients per pool) prior to extraction. We collected sufficient material for four pools each of BE and of esophageal squamous epithelium, and three pools each of gastric and duodenal biopsies. All samples were snap-frozen in liquid nitrogen then ground into a fine powder. Each sample was homogenized by resuspension in lysis solution and passaged through a Qiashredder (Qiagen, Valencia, CA) column. Total RNA was extracted with the Qiagen RNeasy Midi kit using the supplier's protocol. Poly A+ RNA was prepared by oligo dT chromatography (oligo dT cellulose NEB, Beverly, MA; Poly-prep chromatography columns; Bio-Rad, Hercules, CA) from pooled samples (two to four patients per pool) of BE (four pools), esophageal squamous epithelium (four pools), gastric (three pools), and duodenum (three pools).

For each sample, double-stranded cDNA was prepared with Gibco-BRL Superscript II (Life Technologies, Rockville, MD) using 1.5 ^g of mRNA as template. Subsequently, biotin-labeled cRNA was generated using either the Ambion MEGAscript T7 kit or the ENZO Bioarray RNA transcript labeling kit (Affymetrix, Santa Clara, CA). All in vitro transcription (IVT) reactions were carried out for 4-5 hours according to the supplier's instructions. All RNA and cRNA samples were verified by ethidium bromide-stained gel analysis and quantified by SyBrII (Molecular Probes, Eugene, OR) fluorescence.

Array Hybridization

A total of 25 to 50 ^g of each cRNA preparation was fragmented for 35 minutes at 94°C in buffer [40 mM Tris-acetate (pH 8.1)/100 mM magnesium acetate]. Fifteen micrograms of each cRNA was mixed with hybridization buffer to a final volume of 300 ^l. Two different Affymetrix

GeneChip arrays, Hu6800 and HuGeneFL, were used in this study. Each of these arrays contains probes for the same approximately 7000 genes. Arrays were hybridized, washed, and scanned according to the manufacturer's instructions. Scanned output files for each independent experiment were visually inspected for hybridization artifacts then analyzed by GeneChip 3.1 software using a global scaling factor of 100.

Data Normalization and Correlation Analysis

Four separate chips (A, B, C, D) are required to interrogate all the genes in the Hu6800 format. On the individual Hu6800 chips, we observed considerable variation in the relative means and standard deviations of hybridization intensities even with the same tissue. In order to rigorously compare the expression patterns across all experiments, the absolute intensities of the probe sets on each array have to be normalized. However, a major difficulty for normalization is that only a few probe sets are common to the four separate chips of the Hu6800 format. Therefore, we used the data from the higher-density HuGeneFL chips to determine relative intensities of genes on each of the A, B, C, D chips in order to compare the expression levels of genes on different chips in the same experiment. In our initial analysis, one pool of the gastric sample (GAS1), one pool of the duodenum sample (DUO1), four pools of the Barrett's epithelium (BE1 -4), and four pools of the squamous (Sq1-4) samples were hybridized to the Hu6800 arrays, whereas two pools of the duodenum (DUO2,3) and two pools of the gastric (GAS2,3) samples were hybridized to the HuGeneFL chips. In order to normalize across all experiments, we rehybridized one sample each of BE (BE5) and of squamous (Sq5) to HuGeneFL chips.

Because we have multiple experiments that include sets with different chip formats (HU and FL) on each tissue type, we averaged the normalized expression levels of the same tissue types in each set of experiments. The averaged normalized expression levels of all the genes on the arrays were used to calculate the sample correlation coefficient for each pair of tissue types in each set of experiments. The sample correlation coefficient is a point estimate of the true correlation coefficient between two tissue types, but it does not convey any uncertainty about the value of the estimate. Therefore, we also computed the 95% confidence intervals to obtain a more robust comparison of the similarities between tissues. Consequently, two nonoverlapping confidence intervals suggest that one pair of tissue types is more similar than the other pair with high probability. A detailed description of these analyses is given in Yeung et al. [11] and

Data Filtering

In order to identify genes that vary significantly across the different tissue types for subsequent clustering analyses, we filtered the entire normalized data. We employed a modified analysis of variance (ANOVA) procedure: for each gene, we computed the ratio of the between-tissue mean square to the residual mean square. If the ratio is greater than a threshold, the gene passes a filter and is said to vary

Table 1. Correlation Coefficients of Tissue Similarities.

Chip Format Tissues Point Estimate 95% CI

HU GAS1, DUO1 0.807 [0.789, 0.824]

GAS1, Sq(1 -4)* 0.751 [0.730, 0.771 ]

DUO1, Sq(1 -4) 0.732 [0.709, 0.753]

BE(1 -4)*, GAS1 0.851 [0.839, 0.863]

BE(1 -4), DUO1 0.841 [0.827, 0.853]

BE( 1 -4), Sq( 1 -4) 0.830 [0.817, 0.842]

FL GAS(2,3)y, DUO(2,3)y 0.861 [0.851, 0.870]

GAS(2,3), Sq5 0.777 [0.760, 0.793]

DUO(2,3), Sq5 0.748 [0.729, 0.765]

BE5, GAS(2,3) 0.863 [0.853, 0.873]

BE5, DUO(2,3) 0.872 [0.863, 0.881 ]

BE5, Sq5 0.796 [0.780, 0.810]

*Average of four experiments with Hu6800 chips. y Average of two experiments with HuGeneFL chips.

than clusters formed by chance. The predictive power is measured by the within -cluster variance, and is called FOM. Each experiment is left out in turn, and the total FOM over all experiments is computed. A clustering result with a small FOM implies low within-cluster variance, which in turn is an indication of high predictive power. From the FOM analysis, the CAST algorithm with eight clusters produces relatively high-quality clusters on the filtered normalized data. Before applying cluster analysis, we normalized the expression levels of each gene by subtracting the mean of the expression levels over all experiments and then dividing by the standard deviations of the expression levels over all experiments.


significantly across the different tissue samples. The significance threshold is determined by an empirical distribution (generated by randomly permuting the expression levels across different tissues) at a given significance level. At 5% significance level, 1095 genes passed through our filter, and were subsequently evaluated by clustering algorithms.

Selecting Clustering Algorithms

In order to identify tissue-specific genes, we would like to apply a clustering method to assign genes with similar expression profiles into groups. Because no clustering algorithm has emerged as the method of choice for gene expression data, we applied the figure of merit (FOM) methodology [11] to compare the performance of a few popular clustering algorithms on the filtered normalized data, including three hierarchical clustering algorithms (average link, single link, complete link) [12], two partitional algorithms [k-means and Cluster Affinity Search Technique (CAST) [12,13]], and the random algorithm. The latter is a benchmark that randomly assigns genes to clusters. The idea of the FOM is to apply a clustering algorithm to all but one experiment in the data. The expression levels from the excluded experiment are used to assess the predictive power of the resulting clusters — meaningful clusters are expected to exhibit less variation in the excluded experiment

Similarity Between Different Upper GI Tissues and Barrett's Epithelium

We investigated the distinction between metaplastic Barrett's tissue samples and each of the three normal upper GI tissue samples using the Pearson correlation coefficient [14] (Table 1). Furthermore, we summarized the relationships of the point estimates of the sample correlation coefficients, using all the genes on the arrays, for the different tissues as a hierarchical dendrogram in Figure 1. The pairwise comparisons of our first set of experiments (HU format) between the averaged normalized gastric and duodenum (0.807), gastric and squamous (0.751), and duodenum and squamous (0.732) showed that duodenum and gastric epithelium are more related to each other at the transcriptional level than either is to squamous epithelium (Table 1 ). Furthermore, the confidence intervals for the correlation coefficients of gastric versus squamous epithelium and of duodenum versus squamous epithelium do not overlap with the confidence interval for gastric versus duodenum. The results on our second set of experiments (FL format) are similar.

We observed variability in the similarity between individual pools of BE and pools of normal tissues (Table 1) and http:// For example, BE1 has higher point estimates of correlation coefficients to each of the four squamous pools (0.808, 0.810, 0.802, 0.820) than to gastric (0.799), whereas BE4 has higher

Figure 1. Hierarchical clustering of tissues based on point estimates of the Pearson correlation coefficients using all the genes represented by the entire probe set of the Affymetrix (Hu6800 and HuGeneFL) arrays. All samples in italics (Sq1-4, DUO1, GAS1, and BE1-4) were hybridized to Hu6800 arrays, whereas Sq5, DUO2,3, GAS2,3, and BE5 were hybridized to HuGeneFL arrays.

Figure 2. (A) Expression profiles of five tissue-specific clusters (the expression profiles of all eight clusters are available at barretts/neoplasia). The horizontal axis represents the pooled samples of the different tissues analyzed, and the vertical axis represents the normalized expression levels (see Materials and Methods). A high normalized expression level indicates relatively high expression levels compared to other experiments for the same gene. Within each of the five clusters (I-V), the average normalized expression levels (solid lines) ±1 SD (dotted lines) across the 16 experiments are shown. (B) Visualization of the five clusters in a reduced dimensional space. The reduced dimensional space is formed by the first three principal components (PCs ), which capture most of the variation in the original data and are therefore typically used in visualization of high dimensional data from multiple experiments. In the present study, 67% of the variation from the 16 separate hybridizations was captured in the first three PCs. BE - specific cluster (orange filled circles ); gastric - specific cluster (purple circles); duodenum-specific cluster (filled pink rectangles); squamous - specific cluster (green rectangles); nonsquamous epithelium cluster (green crosses ).

correlation coefficients to gastric (0.820) than the four squamous pools (0.754, 0.753, 0.750, 0.788), with three nonoverlapping confidence intervals. The greater similarity of BE4 with the gastric tissues compared to squamous epithelium was also observed in the replicate experiment (BE5) with the identical cRNA using the FL chip format [ BE5, GAS(2,3) (0.863) and BE5, Sq5 (0.796)].

Tissue-Specific Clusters

Figure 2 shows five tissue-specific clusters (out of eight clusters) from applying the CAST algorithm on the filtered normalized data with 1095 genes. The five tissue -specific clusters included clusters of tissue-specific

genes whose expression was elevated in each of the corresponding four GI tissues and a cluster of genes that had increased expression in nonsquamous epithelium relative to squamous epithelia (Figure 2A). The complete data set for all eight clusters is available (

In order to visualize the high dimensional data (16 experiments), we employed a classical dimension reduction technique called principal component analysis (PCA). PCA [15] reduces the dimensionality of the data by transforming to a new combination of variables (the principal components) to summarize the features of the data. The relationships between the genes in the four tissue-specific clusters and

Table 2. Barrett's Epithelium-Specific Genes.

Gene Function Gb Number

TGF-ß superfamily protein Transcription factor AB000584

P1cdc47 S-phase regulation D55716

Calcyclin Calcium-binding protein J02763

Mucin (gastric) Protective cell membrane barrier U97698

Glucagon Stimulation of glycogenolysis and gluconeogenesis JO4O4O

Activating transcription factor 3 (ATF3) Transcription factor (leucine zipper) L19871

Autoantigen pericentriol material (PCM-1) Centrosome autoantigen L27841

Thyroid receptor interactor (TRIP14) Bind to and activate RNase L, resulting in general RNA L40387

degradation and consequent inhibition of protein

synthesis. 2-5As are produced by a well-conserved family

of interferon - induced enzymes, the 2-5A

synthetases or OASs

Mesothelial keratin K7 (type II) Simple epithelial keratin M13955

IgE - binding protein (epsilon-BP) M57710

Epidermal surface antigen (ESA) Cell adhesion M60922

Desmin Subunits of the intermediate filaments M63391

Adipsin/complement factor D Serine protease that is secreted by adipocytes into the bloodstream M84526

LUCA-1/HYAL1 Principal glycosaminoglycans of the extracellular matrix, modulation U03056

of cell proliferation, migration,

and differentiation

17ß- hydroxysteroid dehydrogenase 3 Lipid metabolism; androgen and estrogen metabolism U05659

Mesothelin CAK1 antigen precursor Tumor antigen, cellular adhesion U40434

Small GTP-binding protein rab27b Membrane-bound proteins involved in vesicular fusion and trafficking U57093

Cyr61 Angiogenesis, immediate-early response heparin binding, a(v)/33 U62015

integrin ligand

Nedd-4-like ubiquitin protein Homology to ubiquitin-protein ligases signal transduction potentiate U96114

ligase WWP2 hormone-dependent activation

of transcription

Integrin ß4 Transmembrane glycoprotein receptors that mediate cell-matrix or X53587

cell-cell adhesion, and transduced

signals that regulate gene expression and cell growth

Hr44 Membrane-associated type I antigen X91103

MAT8 Chloride conductance X93036

Keratin 19 Intermediate filament Y00503

CD176 Unknown Y10511

Qip1 Recognize nuclear localization signals (NLS) and dock NLS-containing AB002533

proteins to the nuclear pore complex

Heparan sulfate proteoglycan (HSPG2) Basement membrane M85289

Carnithine palmitoyltransferase 1 Metabolism of complex lipids; glycerolipid metabolism Y08682

Fetal brain glycogen phosphorylase B Metabolism of complex carbohydrates U47025

Fibronectin Collagen binding, metastasis of melanoma cells G3044

Urokinase-type plasminogen receptor Cell migration, pericellular proteolysis U09937

Inhibitor of apoptosis protein 1 (HI1AP1) Inhibitor of apoptosis U45876

Amphiregulin (AR) Growth factor (EGF family), wound healing M30703

EGFR binding

Macrophage inflammatory protein-2ß Cytokine/oncogene X53800


Apomucin Protective cell membrane barrier Z48314

CD97 Heterodimeric receptor associated with inflammation U76764

Mucin (intestinal) Protective cell membrane barrier M22406

Mucin Protective cell membrane barrier M57417

TR3 orphan receptor Steroid receptor, immediate-early response gene/transcription factor L13740

the nonsquamous versus squamous epithelium cluster are depicted in Figure 2B).

Control probes for clustering analyses The presence of probe sets for 20 cytokeratins, including multiple probe sets for individual genes, provided a control for the clustering results. The cytokeratins are subunits of epithelial cell intermediate filaments that have well-characterized tissue - specific patterns. For example, immunohistochemical studies have shown that cytokeratins 4 and 13 are squamous-specific, whereas cytokeratins 8 and 19 are present in columnar epithelium typical of BE [7]. In addition, cytokeratin 7 staining appears to be specific for Barrett's epithelium [16]. Our analyses assigned 12 of 20 cytokeratins, including cytokeratins 4 and 13, to the cluster of genes with relatively high expression in squamous epithelium. Two of 20 cytokeratins, k7 and 19, were in the Barrett's specific cluster and three others, 8, 18, and 20, were present in the cluster that contained genes specific for nonsquamous GI epithelia.

Barrett's epithelium The Barrett's-specific cluster consisted of 38 genes that are upregulated in the Barrett's epithelium (Table 2). These included genes associated with the cell cycle (P1cdc47, PCM-1), cell migration (urokinase-type plasminogen receptor, LUCA-1/HYAL1), growth regulation (TGF-3 superfamily protein, amphiregulin, Cyr61), stress responses (calcyclin, ATF3, TR3 orphan receptor), epithelial cell surface antigens [epsilon-BP, epidermal cell surface antigen (ESA), integrin ¡4, mesothelin CAK-1 antigen precursor], and four mucins.

Duodenum The duodenal cluster contained 211 genes that are upregulated in the duodenal epithelium, including a number of genes involved in lipid and glucose metabolism including SGLT1, intestinal fatty acid binding protein, apolipoproteins, and glucose-6-phosphatase. In addition, it contained the homeobox gene Cdx1, transcription factors HOK-2 (zinc finger), IFP35, HE47 (helix-loop-helix), and ZNF127 (ring zinc finger), insulin growth factor 1, cadherin 17, TIMP3, BRCA2, DRA, and pim 2.

Gastric The gastric-specific cluster contained 105 genes that are upregulated in the gastric epithelium. Transcription factors included ZNF76, HCSX, late upstream transcription factor, HOX4D, and HTF10. In addition, there were several genes associated with various metabolic pathways including ATP synthetase subunit c, cholecystokinin receptor, ceram-ide glucosyltransferase, mitochondrial creatine kinase (MtCK), muscle creatine kinase (CKMM), gastric H,K-ATPase 3 subunit, apolipoprotein C1, apolipoprotein A1 regulatory protein (ARP-1), type 1 inositol ''1,4,5-triphos-phate'' receptor, asparagine synthetase, and creatine kinase-B.

Squamous The squamous-specific cluster contained 203 genes that are upregulated in the squamous epithelium. These included a number of different categories such as oncogenes (pim-1, met, P47 LBC, JunB, H-ras), protei-

nase inhibitors (maspin, elafin, monocyte/neutrophil elas-tase inhibitor, cystatin M, cystatin B, SCCA, SCCA2/leupin, urokinase inhibitor, calpastatin), proteases (protease M, calcium-dependent protease), and a series of cellular structure proteins (sprI, sprII, SPRR2B, SPR2-1, SPRR1A, involucrin, envoplakin, cystatin, elafin) that have been implicated in cellular stress responses, signal transduction and transcriptional regulators (KLF5, PRK2, APRF/STAT3, cold shock domain protein A, ZNFP36, MKK4, MAPKK, RIT, ephrin) and homeobox genes (backfoot, protein 7 Notch group).

Nonsquamous versus squamous epithelium Our clustering analyses also identified a cluster of genes that were upregulated in the nonsquamous tissues compared to esophageal squamous epithelium. This cluster contained 259 genes that were expressed at similar levels in each of BE, gastric, and duodenum.


The application of microarray technology permits a comprehensive analysis of the transcriptional patterns associated with human neoplasia. In addition, the identification of disease-specific expression patterns may be useful for molecular classification of neoplasias. Previous studies have shown that cancers have highly variable expression patterns even within the same tissue subtypes [17-19]. However, few studies have applied this technology to early stages of neoplasia. In our initial microarray investigation, we used pooled whole endoscopic biopsies to acquire sufficient mRNA and to increase representation of transcripts in each tissue. These biopsies contain a mixture of cell types (epithelial, inflammatory) present in each tissue. However, previous studies of DNA content abnormalities present in Barrett's epithelial cells showed that typically 60& to 80% of cells in our endoscopic biopsies are epithelial [20,21]. Therefore, we used comparisons of the different tissues, including those at the same anatomic site and exposed to reflux (BE and squamous), to identify tissue-specific clusters of genes. In addition, we developed tools to analyze large expression data sets and to compare expression of genes across multiple experiments.

Our initial hypothesis was that microarray analyses of Barrett's epithelium would identify disease-specific genes and provide insight into the molecular basis of early neoplasia. Furthermore, we proposed that the comparison of BE to gastric, duodenal, and esophageal squamous epithelia would reveal a differentiation pattern that was either distinct from the surrounding normal tissues of the upper GI tract or had high similarity to one of these tissues. This would identify developmental associations between the neoplastic Barrett's epithelium and one or more of the normal tissues.

In order to analyze and compare the expression patterns of all genes across multiple hybridizations, the array data from each chip must be normalized. The initial experiments in this study were done on Affymetrix Hu6800 chips that required four separate chips for coverage of all the genes in

each experiment. The performance of individual chips varied across different experiments, making it difficult to interpret expression data. One approach for the normalization of microarray data is to use a robust set of genes common to each array as controls for normalization. However, the Hu6800 Affymetrix arrays used in our study contained only a small number of probe sets that were common to each A, B, C, and D chip. Therefore, we included at least one hybridization with the higher-density HuGeneFL chip for each tissue in order to normalize our data set prior to processing.

The correlation analyses with our normalized data set showed that, although highly similar, there was a clear distinction in the expression profiles of the three normal tissues of the upper GI tract (Table 1 ). Pairwise comparisons of each of these tissues revealed that duodenal and gastric tissues were more related to each other than either was to squamous epithelium. In contrast, the confidence intervals for the correlation coefficients between different pools of BE with normal gastric, squamous, and duodenum tissues overlapped, suggesting that BE shared extensive transcrip-tional similarity with all of these surrounding normal tissues. Thus, there was no evidence for a BE lineage-specific developmental association with one of the surrounding normal tissues. Several studies have shown that premalig-nant stages of BE contain different clonal populations of cells with multiple somatically acquired genetic abnormalities [9,22,23]. Therefore, the variability in the expression patterns of BE may reflect the genetic heterogeneity present in a neoplastic epithelium compared to surrounding normal tissues. The admixture of BE epithelium with inflammatory and stromal cells may also be a confounding factor, and future analyses using epithelium-enriched RNA may characterize this variability more clearly.

The chronic acid reflux in patients with GERD results in the denuding of the squamous epithelium of the esophagus and its replacement by metaplastic columnar Barrett's epithelium. Previous genotyping studies have shown that the development of Barrett's metaplasia and the subsequent evolution of neoplasia are associated with inactivation of the CDKN2A/p16 gene and the expansion of clonal populations of epithelial cells [8,9]. However, the pathways that mediate the clonal expansion events have not been well defined. A number of the genes in the BE-specific cluster have been shown to regulate steps in cellular adhesion and cell movement through extracellular matrices under normal physiological conditions (Table 2). These include HYAL1, fibronectin, mesothelin CAK1 antigen precursor, integrin ,04, CYR61, HSPG2, and urokinase-type plasminogen receptor. In addition, this cluster contained calcyclin, ATF3, amphir-egulin, and inhibitor of apoptosis protein 1, all of which could contribute to creating conditions permissive for the extensive expansion of epithelial cells seen in BE. A number of these proteins have commercially available antibodies. These could provide tools for further investigation into the role of these genes and their relationship to the somatic abnormalities that arise during the development and progression to cancer in BE.

The ability of epithelial cells to repopulate regions of mucosal injury is fundamental to the normal physiology of the GI tract. The efficient spreading and migration of epithelial cells across the basement membrane are key initial steps in this response. This process involves the detachment and migration of epithelial cells. Detachment of normal epithelial cells from their cell-cell or cell-substratum contacts usually results in an apoptotic response. However, rapid migration of epithelial cells over mucosal wounds occurs in the absence of apoptosis. The trefoil peptides, intestinal peptide ITF, and the gastric peptides SP and pS2 are key mediators of the initial restitution of damaged mucosal regions in the GI tract [24-26]. Our results showed that the trefoils were absent in squamous epithelia, that duodenum had high levels of ITF, whereas gastric tissues had both SP and pS2 consistent with other studies [27]. In contrast, high levels of all three trefoils were detected in the Barrett's tissues. These could contribute to the effects of the genes in the BE-specific cluster in producing the molecular phenotype of the early neoplasia.

The transcriptional profiles extend previous observations indicating that Barrett's shares phenotypic elements with small intestinal, gastric, and squamous esophageal epithelia. In addition to these observations, the genome-scale characterization of molecular phenotypes of the tissues of the upper GI tract allows investigation into multiple biological processes in each tissue in a single experiment. For example, the molecular phenotype that we characterized in the squamous epithelium contained a series of genes that are involved in the formation of the cornified cell envelope (CE), a protective barrier normally synthesized during late stages of differentiation by stratified squamous epithelia [28]. The main components of the CE include small proline-rich proteins, involucrin, envoplakin, cystatin, andelafin, whereas formation of the CE is the result of extensive cross-linking of several proteins catalyzed primarily by transglutaminases [29]. The CE, in combination with the cytokeratins present in the cluster, represents major structural components of squamous epithelia, providing a protective barrier against reflux-mediated tissue damage [28,29]. Defects in these barriers are associated with tissue susceptibility to injury and ulceration in various skin diseases [30,31]. The expression profile of squamous epithelium from patients in this study provides the potential for a comparative screen in patients without GERD for defects that may mediate susceptibility to the replacement of stratified squamous tissues with meta-plastic columnar tissues in the esophagus.

Our approach of using pooled samples from whole biopsies of each tissue allowed the identification of distinct clusters of genes for each tissue and comparison of the relatedness of a neoplasia to its surrounding normal tissues. The clusters of duodenum- and gastric-specific genes included a number of previously characterized genes associated with the normal physiology of these tissues, including motilin, cholecystokinin, gastric inhibitory polypep-tide, enterokinase, H,K-ATPase catalytic subunit, and trypsinogen. In addition, we have identified different transcription factors and homeobox genes that distinguish these tissues, providing useful reference points for analyzing their

developmental basis. The BE-specific cluster included genes associated with a number of different pathways including cellular migration, alterations in the cell cycle, apoptosis, and stress responses. All of these have been associated with neoplasias [32-34].

To extend these studies to the evolution of cancer, neoplastic epithelial cells need to be purified from the tissue biopsies and characterized for somatic abnormalities. Although surrounding cells and stroma can contribute to tumor development, the evolution of cancer is dependent on the molecular phenotype of the premalignant cells from which it arises. Recent technical advances allow array experiments to be performed with increasingly smaller amounts of starting material, making it feasible to study the expression profiles of neoplasia in single biopsies. The genes identified in this study and the analytical approaches for comparing the expression profiles of different tissues across multiple experiments will provide a basis for further investigations. In particular, the study of gene expression patterns at well-defined transition stages of neoplastic progression should help identify the role of pathways in development of cancer.


[1] Levine DS, et al. (1989). Specialized metaplastic columnar epithelium in Barrett's esophagus. A comparative transmission electron microscopic study. Lab Invest 60(3), 418-32.

[2] Levine DS, et al. (1989). Correlation of ultrastructural aberrations with dysplasia and flow cytometric abnormalities in Barrett's epithelium. Gastroenterology 96(2 Pt. 1), 355-67.

[3] Moore JH, et al. (1994). Intestinal differentiation and p53 gene alterations in Barrett's esophagus and esophageal adenocarcinoma. Int J Cancer 56(4), 487-93.

[4] Regalado SP, et al. (1998). Abundant expression of the intestinal protein villin in Barrett's metaplasia and esophageal adenocarcinomas. Mol Carcinogen 22(3), 182-89.

[5] Wu GD, et al. (1993). Sucrase-isomaltase gene expression in Barrett's esophagus and adenocarcinoma. Gastroenterology 105(3), 837-44.

[6] Salo JA, et al. (1996). Cytokeratin profile suggests metaplastic epithelial transformation in Barrett's oesophagus. Ann Med 28(4), 305-309.

[7] Boch JA, et al. (1997). Distribution of cytokeratin markers in Barrett's specialized columnar epithelium. Gastroenterology 112(3), 760-65.

[8] Galipeau PC, et al. (1999). Clonal expansion and loss of heterozygosity at chromosomes 9p and 17p in premalignant esophageal (Barrett's) tissue. J Natl Cancer Inst 91(24), 2087-95.

[9] Barrett MT, et al. (1999). Evolution of neoplastic cell lineages in Barrett oesophagus. Nat Genet 22(1), 106-109.

[10] Young RA (2000). Biomedical discovery with DNA arrays. Cell 102(1), 9-15.

[11] Yeung KY, Haynor DR, and Ruzzo WL (2001). Validating clustering for gene expression data. Bioinformatics 17(4), 309-18.

[12] Jain AK, and Dubes RC (1988). Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ.

[13] Ben-Dor A, Shamir R, and Yakhini Z (1999). Clustering gene expression patterns. J Comput Biol 6(3-4), 281 -97.

[14] Pearson K (1896). Mathematical contributions to the theory of evolution: iii. Regression, heredity, and pamixia. Philos Trans R Soc London, Ser A187, 253-318.

[15] Joliffe IT, and Morgan BJ (1992). Principal component analysis and exploratory factor analysis. Stat Methods Med Res 1(1 ), 69-95.

[16] Ormsby AH, et al. (2001). The utility of cytokeratin subsets in distinguishing Barrett's - related oesophageal adenocarcinoma from gastric adenocarcinoma. Histopathology 38(4), 307-11.

[17] Bittner M, et al. (2000). Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406(6795), 536-40.

[18] Alizadeh AA, et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503-11.

[19] Perou CM, et al. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA 96(16), 9212-17.

[20] Rabinovitch PS, et al. (1989). Progression to cancer in Barrett's esophagus is associated with genomic instability. Lab Invest 60(1), 65-71.

[21] Reid BJ, et al. (1987). Barrett's esophagus. Correlation between flow cytometry and histology in detection of patients at risk for adenocarci-noma. Gastroenterology 93(1), 1-11.

[22] Prevo LJ, et al. (1999). p53 mutant clones and field effects in Barrett's esophagus. Cancer Res 59(19), 4784-87.

[23] Riegman PH, et al. (2001). Genomic alterations in malignant transformation of Barrett's esophagus. Cancer Res 61(7), 3164-70.

[24] Taupin DR, Kinoshita K, and Podolsky DK (2000). Intestinal trefoil factor confers colonic epithelial resistance to apoptosis. Proc Natl Acad Sci USA 97(2), 799-804.

[25] Sands BE, and Podolsky DK (1996). The trefoil peptide family. Annu Rev Physiol 58, 253-73.

[26] Mashimo H, et al. (1996). Impaired defense of intestinal mucosa in mice lacking intestinal trefoil factor. Science 274(5285), 262-65.

[27] Taupin D, et al. (1999). The trefoil gene family are coordinately expressed immediate-early genes: EGF receptor- and MAP kinase-dependent interregulation. J Clin Invest 103(9), R31 -38.

[28] Cabral A, et al. (2001). Structural organization and regulation of the small proline-rich family of cornified envelope precursors suggest a role in adaptive barrier function. J Biol Chem 26(22), 19231-37.

[29] Candi E, et al. (1999). Transglutaminase cross-linking properties of the small proline-rich 1 family of cornified cell envelope proteins. J Biol Chem 274(11), 7226-37.

[30] Fujimoto W, et al. (1997). Differential expression of human cornifin alpha and beta in squamous differentiating epithelial tissues and several skin lesions. J Invest Dermatol 108(2), 200-204.

[31] Aeschlimann D, and Thomazy V (2000). Protein crosslinking in assembly and remodelling of extracellular matrices: the role of transglutaminases. Connect Tissue Res 41(1), 1 -27.

[32] Stetler-Stevenson WG, and Yu AE (2001). Proteases in invasion: matrix metalloproteinases. Semin Cancer Biol 11(2), 143-52.

[33] Evan GI, and Vousden KH (2001). Proliferation, cell cycle and apoptosis in cancer. Nature 411(6835), 342-48.

[34] Hanahan D, and Weinberg RA (2000). The hallmarks of cancer. Cell 100(1), 57-70.