Available online at www.sciencedirect.com

ScienceDirect

Procedía

CrossMark

ELSEVIER

Procedía Computer Science 90 (2016) 169 - 174

Computer Science

International Conference On Medical Imaging Understanding and Analysis 2016, MIUA 2016,

6-8 July 2016, Loughborough, UK

Spatial Relations of Mammographie Density Regions and their Association with Breast Cancer Risk

Maya Alsheh Alia,b,i, Mickael Garnierc,d, Keith Humphreysa,b

We present a new approach for characterising the shape and the spatial relationships of different categories of density in mammograms. Descriptions of regions are encoded using a forces histogram method and across-image variation is captured using functional principal component analysis. We evaluate the association of the features with breast cancer based on a pilot case-control study using logistic regression with percent density, age, and body mass index included as adjustment variables. The spatial relations were significantly associated with breast cancer status (p= 0.009). Our approach can provide insights into the role of different density regions in the development of breast cancer.

© 2016 The Authors.Published by ElsevierB.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of MIUA 2016 Keywords: Mammography, Spatial organisation, forces histogram, Breast cancer risk

1. Introduction

Breast cancer is the most common cancer diagnosed in women and despite clinical advancements, in the nordic countries, around 20% of breast cancer patients still die of their disease. The most established image-based risk factor for breast cancer is Percent mammographic Density (PD). This is classically evaluated as the ratio of the fibroglandular tissue area over the surface of the entire breast in a mammogram1. Volumetric measures of PD, obtained from a 2D image by, for example, placing phantoms between the plates of the mammography machine to act as a reference from which dense tissue thickness can be estimated, have also been developed. It is not clear which type of measure is optimal for risk prediction2, but, in general, women with high PD (> 75%) have an approximately 6-fold increased risk compared to women with very low PD. Besides PD (or the absolute dense area/volume) there is likely to be additional relevant information which can be extracted from the mammogram. A number of studies have tried to measure heterogeneity of the parenchymal pattern by using texture-based methods ranging from simple (e.g.intensity histogram), to complex (e.g. based on scale space features). Texture features can be extracted from the entire breast region3 or from specific regions of interest, such as the retroareolar area4, the central area of the breast or across a lattice covering the breast.

* Corresponding author E-mail address: maya.alsheh.ali@ki.se (Maya Alsheh Ali).

1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of MIUA 2016 doi:10.1016/j.procs.2016.07.019

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, SE-17177 Stockholm, Sweden. bSwedish eScience Research Centre (SeRC), Karolinska Institutet, SE-17177 Stockholm, Sweden. c Institut Curie, PSL Research University, CNRS, UMR3664, F-75005 Paris, France d Sorbonne Universities, UPMC Univ Paris 06, CNRS, UMR3664, F-75005 Paris, France

Abstract

(a) (b) (c)

Fig. 1. Illustration of image segmentation. The original image (in (a)) segmented into 4 regions using fuzzy c-means is in (b) and using Otsu's method is in (c), where the colours dark blue, blue, yellow and red correspond to fatty, semi-fatty, semi-dense and dense tissue regions respectively.

It is also interesting to study the spatial distribution/shape of dense tissue in the breast and its relationship with breast cancer risk. It has been suggested that the relative distribution of adipose and fibroglandular tissue is involved in breast cancer development5. There are, in fact, multiple ways to describe the spatial organisation of objects inside an image, the most basic relying on qualitative measures such as "to the right of", "close to", or "surround"6, which are examples of directional, distance and topological descriptions. Keller et al.7 measured the distance from the centroid of a segmented region of dense tissue to the skin line. Since the relations between two objects is rarely absolute, fuzzy logic may be useful for assessing the degree of truth of a given relation. The noteworthy histogram of angles8 encapsulates the fuzzy directional relations of all considered directions and several methods stem from this approach9. The forces histogram10 (FH) is a quantitative fuzzy spatial relation description taking into account the directional and the distance relationships as well as the shapes of the objects. The FH approach has been applied to lung CT images to classify lesions according to their relative position with anatomical landmarks11. In this article we describe how it can be used to comprehensively assess whether the spatial organisation of different regions of tissue, or the shapes of these regions, is associated with breast cancer risk.

2. Methods

We start our methods description from the point at which the breast region has already been segmented (i.e. tags, background and pectoral muscle have been removed). After segmenting the breast into regions according to the density, we used the FH approach to capture the shape and spatial relations information. We captured the variation inside the set of FHs using functional principal component analysis12 and assessed associations between principal components and breast cancer status by fitting logistic regression models.

2.1. Segmentation of regions of density

Segmentation of regions within the breast, on a mammogram, is an important and challenging problem and methods are under continual development13. The number of regions/density categories used in the literature has varied widely13 (anything from two to thirteen) and its choice is likely to vary according to the purpose for which the regions are used (e.g. for quantifying density amount, texture or spatial organisation). Here we segment the breast into four regions, which is a common choice and is the same number of regions used in Wolfe's original parenchymal patterns14 (known as N1, P1, P2 and DY) for categorising according to both the extent of densities and their characteristics (prominence of ducts and dysplasia). Four regions, representing very dense tissue, dense with structures (fibrotic stromal tissue and glandular tissue), fatty tissue (Wolfe's normal breast pattern, N1), and fatty breast edge, have also been used in studies of textural features15.

We have used two methods to see how sensitive our overall analysis is to the choice of segmentation algorithm, the first being the fuzzy c-means (clustering) method (Fig. 1 (b)) and the second being Otsu's (global thresholding) method (Fig. 1(c)). Both are considered to provide accurate, albeit somewhat different, segmentations13.

2.2. Shape and spatial relations measurement

To comprehensively measure the relative positions of the different regions of tissue and summarise their shapes, we use the FH method16. Although the idea of the FH is elegant, its computation is demanding and complicated. Here

we describe the basic steps and illustrate its principle. Figure 2 (a) represents two regions/objects A (red) and B (blue), which in the context of our work represent two regions of homogeneous dense tissue. The FH plots measurements obtained by sweeping across the image using a series of parallel lines, along an angle 6. These parallel lines (at an angle 6) are drawn over the image in a way such that every pixel in the image will be touched by exactly one line (Fig. 2 (a)). This is done using Bresenham's line drawing algorithm. For each line a weight is calculated as the sum of the inverted squared distances between all pairs of pixels (such that one of the pixels belongs to the first object and the other pixel to the second object). Distance is measured as the Euclidean distance between the two pixels being considered. The inverse of the squared distance is used in order to place most weight on the closest pairs of pixels. The value of the FH along the angle 6 is the sum of the weights of all the lines in that particular direction. This procedure is repeated across a range of angles between 0 and 2n. The computation of the FH requires pre-setting a single parameter, representing the number of angles. This value defines the length of the histograms and the angular precision of the measure. In addition, it plays an important role in the complexity of the FH computation which is O(an^Jn), where n is the number of pixels in the image and a is the number of considered angles. We can summarise the computation of the spatial relations FH between the two regions A and B by

F2"(6> = I EE daw •

C6 ae{CenA) be{Cen,B) y ' '

where C6 is the set of all lines with an angle 6 cutting through both objects and d(a, b) is the distance between the pixels a and b. {C6 n A) is used to represent a set of pixels (a) in object A through which lines in C6 pass (and {C6 n B) is used similarly). Due to the symmetric property of the FH10, the spatial relationships between B and A can be derived from the FH between A and B.

(a) (b)

Fig. 2. Illustration of the computation of the forces histogram between the objects A (red) and B (blue). (a) shows both objects and the parallel lines along one specific angle, 8 = 18 The forces histogram (with angles between 0°and 360°) describing the spatial relationships between A and B is in (b). The bin highlighted in magenta corresponds to the angle 8 = 18°.

In addition to describing spatial relationships, the FH can be used to describe the shape of (segmented) regions17 by computing the histogram on pairs of pixels within the same region. In this case, unit weights are attached to the pixel distances, i.e. we simply count the number of pairs of pixels falling on lines placed at an angle 6.

Since we segment images into four regions, each mammogram is described by ten FHs, six (4 choose 2) representing the relative positions of the different pairs of regions and four measuring the shape of each region. An example of FH descriptions for a mammographic image are shown in Fig. 3. Because the FH measuring the relative positions of two objects takes the distance between them into consideration, the further apart the two objects are, the lower the magnitude of the measurements will be; see Fig. 3 (c). The amplitude of the FH measuring shape is positively associated with the size of the object, but the actual shape description is not affected by object size and it is therefore straightforward to normalise these FHs if required. In addition, the FH for the shape description is n-periodic (see Fig. 3 (b)), thus the shape FH (but not the spacial relations FH) can be computed for the angles between 0 and n to save memory and computation while removing redundant information.

2.3. Statistical analysis

To summarise the information in each FH we use functional principal component analysis12 (fPCA). We view the FH as a function of the angle 6; Fig. 3. fPCA attempts to characterize the dominant modes of variation of a sample of

A' ' A —region 1

—region 2

I \ 1 \ —region 3

\ \ •—region 4

200 Angle

1200 1000 800 ^ 600 400 200

— regions 1&2

— regions 1&3

•—regions 1&4

-"-regions 2&3

* regions 2&4

regions 3&4

200 Angle

Fig. 3. Example of FH descriptions of mammogram. (a) shows an image segmented into four regions (fatty (1), semi-fatty (2), semi-dense (3) and dense (4)) using fuzzy c-means method. The FHs describing the shapes of each region are in (b) and the FHs describing spatial relationships between pairs of regions are in (c).

functions around an overall mean trend function. We carry out a fPCA on each FH using the method included in PACE (a Matlab package for functional data). This method18 first estimates the mean and the covariance functions using scatter plot smoothers that remove measurement errors, and then estimates eigenvalues and eigenfunctions. In the last step, conditional expectation is used to provide predictions of the functional principal component (fPC) scores. This process transfers functional data to K-dimension multivariate data consisting of the first K fPC scores. We evaluate their association with case-control status using logistic regression (case-control status as dependent variable and fPCs as independent variables). We include age, body mass index and PD as adjustment variables. Logistic regression models are fitted in R. We test association with spatial relations and shape fPCs separately using likelihood ratio tests. P values less than 0.01 are considered statistically significant - this threshold is a conservative one since we carry out very few (four) tests in our main analyses, and these are by no means independent. Based on the fPCs, we use the modes of variation12 for the FHs to visualize and describe the variation in the FHs contributed by each eigenfunction. The set of functions for the modes of variation that are viewed simultaneously over a e [-2, +2] are given by V(k) = ju(0) ± a V^k0k(9), for the kth retained fPC, for the considered FH, where ¿u(9) is the mean function, Ak is the kth eigenvalue, and (9) is the kth eigenfunction.

3. Materials

Mammograms from 500 women (250 cases, 250 controls) were selected from CAHRES (CAncer and Hormone REplacement Study), a population-based breast cancer case-control cohort initiated in the mid-1990s19. For women diagnosed with cancer, we used the image of the contralateral breast, whereas for controls an image of a single side was selected at random. We used the mediolateral oblique (MLO) view since it offers the best opportunity to visualize the maximum amount of breast tissue in a single image. All mammograms included here were taken within two years prior to diagnosis (cases) or prior to date of questionnaire. Mammograms were digitized with an Array 2905HD Laser Film Digitizer, Array Corporation, Hampton, NH, USA. Density resolution was set at 12 bit, spatial resolution at 5.0 mm and optical density at 0-4.7. The size of the images was 4770 x 3580 pixels. For the present study we used data on age and body mass index (BMI) which was available for all included women. For each image, area based PD had been measured using the user-assisted approach of Cumulus20 and PD was square-root-transformed prior to analysis.

4. Experiments

We will refer to the four segmented regions as 1, 2, 3, and 4 corresponding respectively to fatty, semi-fatty, semi-dense, and dense tissue. The segmentation is cleaned using a morphological erosion with a square structural element of width 3 pixels. Once the images were segmented, the FH was applied to measure the relative position between each pair of regions and the shape of each region. To reduce the computational complexity of the FH, we rescaled each image to 20% of its original size (This reduce the FHs computation time from 367 seconds to 11 seconds). The number of angles was set to 180 for the spatial relations FH and 90 for the shape FH (in both cases a step of n/90 was used). We carried out a separate fPCA of each of the ten FHs. From each analysis we retained only the K first

functional principal components which accounted for a cumulative variance of 85 %. The number of retained fPCs was the same for both segmentation methods.

For each segmentation method, we included the (five) retained fPCs for the shape FHs, along with age, BMI and PD, as covariates in a logistic regression model for case-control status. To evaluate whether the fPCs add significant information to the other factors, we used a likelihood ratio test with degrees of freedom equal to the number of the considered fPCs. We repeated the same procedure using the (14) retained fPCs for the spatial relations.

We obtained evidence that spatial relations hold important information associated with the risk of breast cancer which is independent of PD (p= 0.009); Table 1. We note that without adjusting for PD, the significance of the spatial fPCs was similar (Table 1) and that the estimated coefficient for PD was hardly changed by including the fPCs in the model (data not shown), suggesting that the effects of PD and the fPCs are largely independent of each other. For both segmentation methods, the shape did not add significant information for explaining case-control status. The fPCs

Table 1. P-values obtained from likelihood ratio tests using two different segmentation methods.

with adjustment for age, BMI, PD with adjustment for age, BMI

Segmentation method Shape Spatial Relations Shape Spatial Relations

fuzzy c-means 0.186 0.009 0.093 0.007

Otsu 0.159 0.044 0.087 0.030

of the relative position of the different regions to the fatty breast edge (region 1) appear to be the most important. A stepwise selection of fPCs (using the step command in R) retained fPC?2, fPC23, fPCl4 and fPCl3 for fuzzy c-means segmentation and fPCl2, fPCl3, fPCl4 and fPC34 for Otsu's method (where fPC^ denotes the ith fPC for the spatial relations FH of the regions x and y). For both segmentations, fPC^ is the strongest component associated with breast cancer status. Figure 4 shows the modes of variation plots for the first two fPCs of spatial relations between regions 1 and 2, along with a plot of its FH data (Figure 4(a)), for the fuzzy c-means segmentation method. fPC^ appears, loosely speaking, to be contrasting the magnitudes of the FHs at angles of around 175°and 50°(Figure 4(c)).

5. Discussion

In this article we have presented a new way to analyze mammograms by studying the shapes of different types of tissue and their spatial relationships. All descriptions were encoded using forces histograms, which provide fuzzy quantitative representations of the relative positions between pairs of regions and of the shape of each region. Variations in the FH functions across mammograms were captured using a functional principal component analysis. We showed that the investigated spatial relations features jointly add significant information (p= 0.009) to the model thus paving the way for an extensive validation on larger datasets of both analog and digital mammograms as well as a biological interpretation of these findings. Assuming these results can be validated, their (biological) interpretation will require an in-depth study of the retained fPCs to more closely identify which spatial relationships yield information associated with cancer risk. In order to interpret such results it may be necessary to scrutinise carefully individual

images and modify the FH approach to obtain metrics under specific assumptions concerning the segmented regions. Examples of such metrics are the surroundness and the inner-adjacency16.

Rotation invariance of the FH can be deduced from its definition, since a rotation from an angle can be considered as a translation of the FH of the number of bins corresponding to that angle. This property could be exploited to address inconsistencies in the way that the breast is positioned in the mammogram. For instance, an angle of 45° could be fixed as a reference, and according to the angle of the pectoral muscle in relation to the horizontal axis of the image, the FH could be translated to the left or to the right.

In our study we have used only one projection image of the breast (MLO). Taking into account another mammogram view, such as the Cranio-Caudal view, could help to predict cancer risk more accurately by adding more precise information about the relative positions of the different types of tissue. It would also be interesting to extend the method described here to MRI images, which provide a detailed 3D view of the breast where patterns of fibroglandular tissue are clearly visible and not subject to tissue overlapping.

We envisage that the approach we describe will be useful in several breast cancer research contexts, for example, for studying differences between breast cancer subtypes (which can be defined in different ways, e.g. by tumor grade or by estrogen receptor status) and for studying the role of density in screening sensitivity. In each case, different aspects of density are likely to be important and application of our comprehensive approach to studying shape and spatial relations may generate specific, testable hypotheses.

Acknowledgements

This research was supported by the Swedish Cancer Society (grant number CAN 2014/472) and the Cancer Health Risk Prediction Center (CRISP; www.crispcenter.org), a Linneus Center (Contract ID 70867902) financed by the Swedish Research Council.

References

1. Boyd, N.F., Guo, H., Martin, L.J., Sun, L., Stone, J., Fishell, E., et al. Mammographic density and the risk and detection of breast cancer. New England Journal of Medicine 2007;356(3):227-236.

2. Boyd, N., Martin, L., Gunasekara, A., Melnichouk, O., Maudsley, G., Peressotti, C., et al. Mammographic density and breast cancer risk: evaluation of a novel method of measuring breast tissue volumes. Cancer Epidemiology Biomarkers & Prevention 2009;18(6):1754-1762.

3. Nielsen, M., Vachon, C.M., Scott, C.G., Chernoff, K., Karemore, G., Karssemeijer, N., et al. Mammographic texture resemblance generalizes as an independent risk factor for breast cancer. Breast Cancer Research 2014;16:R37.

4. Wei, J., Chan, H.P., Wu, Y.T., Zhou, C., Helvie, M.A., Tsodikov, A., et al. Association of computerized mammographic parenchymal pattern measure with breast cancer risk: a pilot case-control study. Radiology 2011;260(1):42-49.

5. Pereira, S.M.P., McCormack, V.A., Moss, S.M., dos Santos Silva, I.. The spatial distribution of radiodense breast tissue: a longitudinal study. Breast Cancer Research 2009;11(3):1-12.

6. Freeman, J.. The modelling of spatial relations. Computer Graphics and Image Processing 1975;4(2):156-171.

7. Keller, B.M., Conant, E.F., Oh, H., Kontos, D.. Breast Imaging: 11th International Workshop; chap. Breast Cancer Risk Prediction via Area and Volumetric Estimates of Breast Density. Springer Berlin Heidelberg;2012, p. 236-243.

8. Miyajima, K., Ralescu, A.. Spatial organization in 2D images. In: Proc. FUZZ. IEEE;1994, p. 100-105.

9. Bloch, I.. Fuzzy spatial relationships for image processing and interpretation : a review. Image and Vision Computing 2005;23(2):89-110.

10. Matsakis, P., Wendling, L.. A new way to represent the relative position between areal objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 1999;21(7):634-643.

11. Shyu, C.R., Matsakis, P.. Spatial lesion indexing for medical image databases using force histograms. In: Computer Vision and Pattern Recognition IEEE; vol. 2. IEEE;2001, p. II-603.

12. Shang, H.L.. A survey of functional principal component analysis. Advances in Statistical Analysis 2014;98(2):121-142.

13. He, W., Juette, A., Denton, E.R.E., Oliver, A., Martí, R., Zwiggelaar, R.. A review on automatic mammographic density and parenchymal segmentation. International journal of breast cancer 2015;2015.

14. Wolfe, J.N.. Breast patterns as an index of risk for developing breast cancer. American Journal of Roentgenology 1976;126(6):1130-1137.

15. Linguraru, M.. Feature Detection in Mammographic Image Analysis. Phd thesis;University of Oxford;2002.

16. Matsakis, P.. Understanding the spatial organization of image regions by means of force histograms: a guided tour. In: Applying soft computing in defining spatial relations. Springer; 2002, p. 99-122.

17. Tabbone, S., Wendling, L., Tombre, K.. Matching of graphical symbols in line-drawing images using angular signature information. Springer, Document Analysis and Recognition 2003;6(2):115-125.

18. Yao, F., Müller, H.G., Wang, J.L.. Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 2005;100(470):577-590.

19. Eriksson, L., Czene, K., Rosenberg, L., Humphreys, K., Hall, P.. The influence of mammographic density on breast tumor characteristics. Breast cancer research and treatment 2012;134(2):859-866.

20. Byng, J.W., Boyd, N., Fishell, E., Jong, R., Yaffe, M.J.. The quantitative analysis of mammographic densities. Physics in medicine and biology 1994;39(10):1629.