Home > Published Issues > 2015 > Volume 6, No. 2, May 2015 >

Diagonal Discriminant Analysis for Gene-Expression Based Tumor Classification

Gokmen Zararsiz 1, Selcuk Korkmaz 1, Dincer Goksuluk 1, Vahap Eldem 2, and Ahmet Ozturk 3
1. Department of Biostatistics, Hacettepe University, Ankara, Turkey
2. Department of Biology, Istanbul University, Istanbul, Turkey
3. Department of Biostatistics, Erciyes University, Kayseri, Turkey

Abstract— A reliable and accurate tumor classification is crucial for successful diagnosis and treatment of cancer diseases. With the recent advances in molecular genetics, it is possible to measure the expression levels of thousands of genes simultaneously. Thus, it is feasible to have a complete understanding the molecular markers among tumors and make a more successful and accurate diagnosis. A common approach in statistics for classification is linear and quadratic discriminant analysis. However, the number of genes (p) is much more than the number of tissue samples (n) in gene expression datasets. This leads to data having singular covariance matrices and limits the use of these methods. Diagonal linear and diagonal quadratic discriminant analyses are more recent approaches that ignore the correlation among genes and allow high-dimensional classification. Nearest shrunken centroids algorithm is an updated version of diagonal discriminant analysis, which also selects the genes that mostly contributed in class prediction. In this study we will discuss these algorithms and demonstrate their use both in microarray and RNA sequencing datasets.

Index Terms—classification, discriminant analysis, gene expression, RNA sequencing, tumor classification

Cite: Gokmen Zararsiz, Selcuk Korkmaz, Dincer Goksuluk, Vahap Eldem, and Ahmet Ozturk, "Diagonal Discriminant Analysis for Gene-Expression Based Tumor Classification," Vol. 6, No. 2, pp. 59-62, May, 2015. doi:10.12720/jait.6.2.59-62