You are on page 1of 38

Support Vector Machines (SVMs)

Lirong Tan, Ph.D. student

Advisor: Dr. Jason Lu and Dr. Scott K. Holland

August, 12th

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 1 / 38


Outline

1 What SVM can do? Introduce some concepts with a 2-dimensional


artificial data.

2 How to use SVM? Model training/selection/evaluation

3 Use a UCI dataset for a practice.

4 SVM regression (SVR)

5 Feature extraction from fMRI data

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 2 / 38


Applications for SVM

1 Classification:
1 Predict a house expensive or not.
2 Brain states decoding: lie detector
3 Automatic disease diagnosis: early detection of Alzheimers disease

2 Regression:
1 Predict the house price
2 Predict cochlear implantation (CI) outcomes
3 Predict the reading gains in children with dyslexia

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 3 / 38


Introduction

1 SVM is multivariate method.

2 Input for SVM: some training set

D = {(x (i) , y (i) ) | x (i) Rm }N


i=1 (1)

where x (i) is called feature vector, the superscript (i) indicates the
i-th training sample, x (i) is a m dimensional vector
(i) (i) (i)
x (i) = [x1 , xj , , xm ] (2)

(i)
xj is the j-th feature/independent variable for the i-th subject. y (i)
is called label/dependent variable.

3 Given the training set, SVM is able to learn the rules to predict y .

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 4 / 38


Intuition for SVM
SVM is also known as large margin classifier.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 5 / 38


libsvm download and installation

1 Download: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
oldfiles/index-1.0.html

2 Documents: http://www.csie.ntu.edu.tw/~cjlin/libsvm/

3 Unzip

4 Move to the folder where you want to install the libsvm, e.g.
C:\users\MATLAB\tools

5 Set path

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 6 / 38


Demo Code

How to train a SVM model? How to use the trained model to predict new
samples whose true label is not known?

1 clc
2 clear
3 load data.mat;
4 m=min(data);
5 n=max(data);
6 data=(datarepmat(m,size(data,1),1))./repmat(nm,size(data,1),1);
7 test=(testm)./(nm);
8 plot(data(1:10,1),data(1:10,2),'o','MarkerSize',12);
9 hold on
10 plot(data(11:21,1),data(11:21,2),'r*','MarkerSize',12);
11 plot(test(1),test(2),'k*','MarkerSize',12);
12 model = svmtrain(label, data, 's 0 t 0 c 10');
13 [predicted label, accuracy, decision score] = svmpredict(label, data, model);
14 w = model.SVs' * model.sv coef;
15 x0=[0,1];
16 y0=zeros(1,2);
17 y0(1)=(model.rhox0(1)*w(1))/w(2);
18 y0(2)=(model.rhox0(2)*w(1))/w(2);
19 plot(x0,y0,'k');
20 axis([0,1,0,1.2])

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 7 / 38


Feature scaling
1 In SVM training, we need to scale the features. Otherwise, features
with large range will dominate the training.

2 Two ways to scale features:


1 Linearly scale the features to the range [0,1] or [-1,1]
x min
x0 =
max min
2 (x min)
x0 =
max min
2 Normalize the feature to zero mean, and unit variance. (zscore
function in matlab)
x mean
x0 =
standard deviation
3 Scaling the train and test samples in the same way. Use the min,
max, mean, std from train samples to scale the test sample.
Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 8 / 38
parameter -s
-s svm type : set type of SVM (default 0)
0 C-SVC
1 nu-SVC
2 one-class SVM
3 epsilon-SVR
4 nu-SVR

Basically C-SVC and nu-SVC are the same thing but with different
parameters. You can choose either one.

Black boundary: -s 0 -t 0 -c 10
Green boundary: -s 1 -t 0 -nu 0.7

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 9 / 38


parameter -t
-t kernel type : set type of kernel function (default 2)
0 linear: u*v
1 polynomial: ( u 0 v + coef 0)degree
2 radial basis function: exp( |u v |2 )
3 sigmoid: tanh( u 0 v + coef 0)

Linear kernel: No kernel. Decision boundary is a line/hyperplane in the


original space. Decision rule is as follows:
(
1; if z 0
y= (3)
0; otherwise
where z = w T x + bias = w1 x1 + + wm xm + bias.

You may choose linear kernel, when the number of features is large and
the number of training samples is small. Another advantage of the linear
kernel is that the model is easy to interpret.
Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 10 / 38
parameter -t
Radial basis function kernel (RBF):
1 Another parameter needs to be set, . If is too small, the model
may be underfitted. If is too large, it may be overfitted.

2 Choose RBF kernel, when the number of features is small and the
number of training samples is large.

3 Make sure you have performed feature scaling before you use RBF
kernel. Otherwise, exp( |u v |2 ) will be dominated by features
with large range.
Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 11 / 38
parameter -t

Other suggestions:

1 Recent research shows that if RBF is used with model selection, then
there is no need to consider the linear kernel.

2 People tend to not use the polynomial and sigmoid kernel that much.

3 For libsvm, you may define your own kernel functions, and feed the
precomputed kernel matrix to SVM.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 12 / 38


parameter -c

1 Objective Function:
N
X M
X
C [yi cost1 + (1 yi ) cost0 ] + wj2 (4)
i=1 j=1

where N is the total number of training samples, M is number of


features.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 13 / 38


parameter -c

2 The first term is actually to pernalize the samples that have been
classified wrongly.

3 We can prove that


1
Margin PM
2
j=1 wj

SVM is to minimize the objective function defined in Equation 4, so it


is to maximize the Margin.

4 Therefore, we tend to classify all training samples correctly, when c is


very large. When c is not too large, the SVM tends to ignore the
outliers and find the decision boundary with maximal margin.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 14 / 38


parameter -c
Black Boundary: -s 0 -t 0 -c 10
Green Boundary: -s 0 -t 0 -c 10000

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 15 / 38


Model Evaluation

Use samples with true labels to assess the performance of the model.

Category True Label Predicted Label


True Positive (TP) positive positive
Positive samples
False Negative (FN) positive negative
True Negative (TN) negative negative
Negative samples
False Positive (FP) negative positive

TP
sensitivity = (5)
TP + FN
TN
specificity = (6)
TN + FP
TP + TN
accuracy = (7)
TP + TN + FP + FN

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 16 / 38


Model Evaluation

1 Plot the receiver operating characteristic (ROC) curve, and calculate


the area under the curve (AUC).

2 AUC evaluates how the model ranks the samples. For example, if the
predicted scores for negative samples are always smaller than the
predicted scores of positive samples, this model tends to have a high
AUC.

1 [X,Y,THRE,AUC]=perfcurve(labels,scores,posclass);
2 plot(X,Y,'.')
3 xlabel('False Positive Rate (1specificity)');
4 ylabel('True Positive Rate (sensitivity)');
5 title('AUC=x');

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 17 / 38


Model Evaluation
Each vertical line segment represents a positive sample, and each
horizontal line segment represents a negative sample.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 18 / 38


Model Selection & Evaluation

We usually partition the data into 3 parts:


1 Training set (60%): used for training
2 Cross-validation set (20%): used for model selection
3 Testing set (20%): used for model evaluation

procedures:
1 Train models with different parameters (c = 22 , 21 , 20 , 21 , 22 )
2 Test the models on the cross-validation set, and pick the model with
the highest AUC/accuracy
3 Test the picked model on the testing set, and report the performance

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 19 / 38


Model Selection/Grid Search
In RBF kernel, we have two parameters (c and ), we use grid search.

Each cross point corresponds to a combination of c and

Try all the combinations defined on the grid. Choose the model that
gives out the best performance on the cross-validation set.
Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 20 / 38
Model Selection & Evaluation

When the number of samples is limited, we use the technique called


cross-validation.

Take 3-folds of cross-validation for example. We randomly partition the


data into 3 parts.
1 Fold 1: use part 1 for testing, and part 2 and 3 for training
2 Fold 2: use part 2 for testing, and part 1 and 3 for training
3 Fold 3: use part 3 for testing, and part 1 and 2 for training

The model is assessed based on its performance on all the testing samples
across Fold 1, 2, and 3.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 21 / 38


Model Selection & Evaluation

In NeuroImage field, the number of samples is usually limited. In this


situation, we usually use the technique called Leave-One-Out
Cross-Validation (LOOCV).

Suppose we have 5 samples:


1 Fold 1: use 1st sample for testing, and the rest for training
2 Fold 2: .... 2nd .................................................................
3 Fold 3: .... 3rd .................................................................
4 Fold 4: .... 4th .................................................................
5 Fold 5: .... 5th .................................................................

The model is evaluated based on its performance on the 5 testing samples.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 22 / 38


Model Selection & Evaluation

If we need to select parameters, we may use two-folds LOOCV to make


the evaluation more objective

Suppose we have 5 samples:


1 Fold 1: use 1st sample for testing, and perform a LOOCV on the rest.
Use the LOOCV for model selection, and then apply the selected
model to the 1st sample for testing.
2 Fold 2: .... 2nd .................................................................................
3 Fold 3: .... 3rd .................................................................................
4 Fold 4: .... 4th .................................................................................
5 Fold 5: .... 5th .................................................................................

Report the performance on the 5 testing samples.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 23 / 38


UCI Dataset Information

1 Iris dataset from http://archive.ics.uci.edu/ml/

2 The data set contains 3 classes of 50 instances each, where each class
refers to a type of iris plant.

3 Attribute information:
1 sepal length in cm
2 sepal width in cm
3 petal length in cm
4 petal width in cm

4 Class information: Setosa, Versicolour, Virginica

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 24 / 38


Demo CodeTwo-folds of LOOCV

1 clc
2 clear
3 load iris.mat;
4 features=features(logical(label6=1),:);
5 label=label(logical(label6=1));
6 label(logical(label==2))=1;
7 label(logical(label==3))=0;
8 sampleN=length(label);
9 score=zeros(sampleN,1);
10 cs=10:10;
11 for i=1:sampleN
12 train=features([1:i1,i+1:end],:);
13 test=features(i,:);
14 [train,test]=func scale(train,test);
15 trainLabel=label([1:i1,i+1:end]);
16 testLabel=label(i);
17 AUCs=zeros(length(cs),1);
18 for j=1:length(cs)
19 AUCs(j)=func LOOCV(train,trainLabel,2(cs(j)));
20 end
21 [tmp,j]=max(AUCs);
22 model = svmtrain(trainLabel, train, ['s 0 t 0 c ',num2str(2(cs(j)))]);
23 [predicted label, accuracy, score(i)] = svmpredict(testLabel, test, model);
24 end
25 [X,Y,THRE,AUC]=perfcurve(label,score,1);
26 plot(X,Y,'.');

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 25 / 38


Demo Codefunction func scale() & func LOOCV

1 function [train,test]=func scale(train,test)


2 Mi=max(train);
3 mi=min(train);
4 m=size(train,1);
5 n=size(test,1);
6 train=(trainrepmat(mi,m,1))./(repmat(Mimi,m,1));
7 test=(testrepmat(mi,n,1))./(repmat(Mimi,n,1));
8 end

1 function AUC=func LOOCV(data,label,c)


2 sampleN=length(label);
3 score=zeros(sampleN,1);
4 for i=1:sampleN
5 train=data([1:i1,i+1:end],:);
6 test=data(i,:);
7 trainLabel=label([1:i1,i+1:end]);
8 testLabel=label(i);
9 model = svmtrain(trainLabel, train, ['s 0 t 0 c ',num2str(c)]);
10 [predicted label, accuracy, score(i)] = svmpredict(testLabel, test, model);
11 end
12 [X,Y,THRE,AUC]=perfcurve(label,score,1);
13 end

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 26 / 38


Demo CodeMultiple classes & 10-folds of
cross-validation

1 clc
2 clear
3 load iris.mat;
4 cvidx = [crossvalind('Kfold', 50, 10);crossvalind('Kfold', 50, ...
10);crossvalind('Kfold', 50, 10)];
5 trueLabel=[];
6 predictedLabel=[];
7 for i=1:10
8 train=features(logical(cvidx6=i),:);
9 test=features(logical(cvidx==i),:);
10 trainLabel=label(logical(cvidx6=i));
11 testLabel=label(logical(cvidx==i));
12 trueLabel=[trueLabel;testLabel];
13 [train,test]=func scale(train,test);
14 model = svmtrain(trainLabel, train, 's 0 t 0 c 1');
15 [predicted label, accuracy, descion score] = svmpredict(testLabel, test, model);
16 predictedLabel=[predictedLabel;predicted label];
17 end
18 accuracy=sum(predictedLabel==trueLabel)/length(trueLabel);

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 27 / 38


Model Interpretation

In linear SVM, we can use the weights to measure the importance of the
features.

Lets start from the binary class and assume you have two labels 0 and 1.
After obtaining the model from calling svmtrain, do the following to have
w and b:

1 w = model.SVs' * model.sv coef;


2 b = model.rho;
3 if model.Label(1) == 0
4 w = w;
5 b = b;
6 end

The larger the absolute value of w(i), the more important the i-th feature.

Model: y = w T x + b

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 28 / 38


SVR linear model

1 clc
2 clear
3 N=1000;
4 x=randn(N,1);
5 y=2*x+randn(N,1)+3;
6 Mi=max(x);
7 mi=min(x);
8 x=(xmi)./(Mimi);
9 plot(x,y,'.')
10 model=svmtrain(y,x,'s 3 t 0 c 4');
11 [predicted label, accuracy, descion score] = svmpredict(y, x, model);
12 w = model.SVs' * model.sv coef;
13 b=model.rho;
14 hold on
15 plot([min(x),max(x)],[min(x)*w+b,max(x)*w+b],'r','LineWidth',2);
16 xlabel('x');
17 ylabel('y');
18 title('SVR linear model');

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 29 / 38


SVR linear model

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 30 / 38


SVR non-linear model

1 clc
2 clear
3 N=1000;
4 x=randn(N,1);
5 y=(x0.5).2+randn(N,1);
6 Mi=max(x);
7 mi=min(x);
8 x=(xmi)./(Mimi);
9 plot(x,y,'.')
10 model=svmtrain(y,x,'s 3 t 2 c 8');
11 [predicted label, accuracy, descion score] = svmpredict(y, x, model);
12 hold on
13 plot(x,predicted label,'r*')
14 xlabel('x');
15 ylabel('y');
16 title('SVR nonlinear model');

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 31 / 38


SVR non-linear model

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 32 / 38


SVR multi-dimensional data & grid search

1 clc
2 clear
3 load moore.mat;
4 data=moore;
5 label=data(:,end);
6 data=data(:,1:end1);
7 Mi=max(data);
8 mi=min(data);
9 m=size(data,1);
10 data=(datarepmat(mi,m,1))./(repmat(Mimi,m,1));
11
12 cs=15:15;
13 rs=zeros(length(cs)3,1);
14 MSEs=zeros(length(cs)3,1);
15 idx=0;
16 ijk=zeros(length(cs)3,3);
17 for i=1:length(cs)
18 for j=1:length(cs)
19 for k=1:length(cs)
20 idx=idx+1;
21 model = svmtrain(label,data, ['s 3 t 2 c ',num2str(2(cs(i))),' p ...
',num2str(2(cs(j))),' g ',num2str(2(cs(k)))]);
22 [predicted label, accuracy, descion score] = svmpredict(label, data, model);
23 MSEs(idx)=accuracy(2);
24 rs(idx)=accuracy(3);
25 ijk(idx,:)=[i,j,k];
26 end
27 end
28 end

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 33 / 38


Feature Extraction from fMRI data

How to extract features from fMRI data? This is the most important
question for multivariate analysis.

1 Use the fMRI time series directly. The number of features would be:
the number of voxels * the number of time points

2 Construct a contrast map, each voxel becomes a features.

3 Define ROIs, each ROI is measured as the mean contrast value of the
voxels within this ROI

4 Use ICA time series. The number of ICs * the number of time points.

5 Construct brain network first, then extract network features.

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 34 / 38


Feature Extraction from sMRI data

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 35 / 38


Feature Extraction from fMRI data

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 36 / 38


Important Features

1 Classification problem:
normal hearing (NH) vs.
hearing impaired (HI)

2 Classifier: linear SVM

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 37 / 38


Thank you!

Lirong Tan (CCHMC) PNRC Neuroimaging Training Course 2013-8-12 38 / 38

You might also like