Journal of biomedical informatics
-
Data originating from biomedical experiments has provided machine learning researchers with an important source of motivation for developing and evaluating new algorithms. A new wave of algorithmic development has been initiated with the publication of gene expression data derived from microarrays. Microarray data analysis is particularly challenging given the large number of measurements (typically in the order of thousands) that are reported for relatively few samples (typically in the order of dozens). ⋯ In this article, we briefly review the biology underlying microarrays, the process of obtaining gene expression measurements, and the rationale behind the common types of analyses involved in a microarray experiment. We outline the main challenges and reiterate critical considerations regarding the construction of supervised learning models that use this type of data. The goal of this article is to familiarize machine learning researchers with data originated from gene expression microarrays.