Pre-processed feature ranking for a support vector machine

2009

Abstract

A computer-implemented method is provided for ranking features within a large dataset containing a large number of features according to each feature's ability to separate data into classes. For each feature, a support vector machine separates the dataset into two classes and determines the margins between extremal points in the two classes. The margins for all of the features are compared and the features are ranked based upon the size of the margin, with the highest ranked features corresponding to the largest margins. A subset of features for classifying the dataset is selected from a group of the highest ranked features. In one embodiment, the method is used to identify the best genes for disease prediction and diagnosis using gene expression data from micro-arrays.

Authors

Jason Weston
A Elisseeff
Bernhard Schölkopf
Isabelle Guyon
Isabelle Guyon