Correspondence should be addressed to:
Quantitative Sciences, GlaxoSmithKline, Collegeville, PA 19426, USA
We developed a novel tool for microarray data analysis that can parsimoniously discover highly predictive genes by finding the optimal trade off between fold change and t-test p value through rigorous cross validation. In addition to find a small set of highly predictive genes, the tool also has a procedure that recursively discovers and removes predictive genes from the dataset until no such genes can be found. We applied our tool to a public breast cancer dataset with the goal to discover genes that can predict patient’s response to a preoperative chemotherapy. The results show that estrogen receptor (ER) gene is the most important gene to predict chemotherapeutic response and no gene signatures can add much clinical benefit for the whole patient population. We further identified a clinically homogenous subgroup of patients (ER-negative, PR-negative and HER2-negative) whose response to the chemotherapy can be reasonably predicted. Many of the discovered predictive markers for this subgroup of patients were successfully validated using a blinded validation set.