If want to partition these two classes using a line (or hyperplane), the green hyperplane will seperate the two classes with maximum margin between the two classes. The data points which decide the margin are called support vectors. In other words, these are the points that lie closest to the decision surface (link).

Support Vectors

Equations

## Advantages

- Performs well in higher dimensional spaces (where p is higher relative to n)
- Memory efficient

## Disadvantage

- Probability estimates need to be derived indirectly
- Problem of overfitting

## Example

The data used for demonstrating the logistic regression is from the Titanic dataset. For simplicity I have used only three features (Age, fare and pclass).

## 1) SVM with linear kernel

We will see how to fit SVM with linear kernel and how to perform CV.

#Ravel y=y.values.ravel() #import from sklearn import svm from sklearn.model_selection import train_test_split #Splitting x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) #SVC with Linear kernel svc = svm.SVC(kernel='linear') #Fit svc.fit(x_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)

from sklearn.model_selection import cross_val_score from sklearn.metrics import accuracy_score #CV scores, 5 fold CV scores = cross_val_score(svc, x_train, y_train, cv=5) #Prediction and accuracy y_pred = svc.predict(x_test) accuracy_test = accuracy_score(y_test, y_pred) #Print the summary print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)) print ("Test Accuracy: %0.2f" % (accuracy_test))

Accuracy: 0.69 (+/- 0.03)

Test Accuracy: 0.70

## 2) SVM with polynomial kernel

#SVC with polynomial kernel svc = svm.SVC(kernel='poly', degree=2) #Fit svc.fit(x_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,

decision_function_shape='ovr', degree=2, gamma='auto', kernel='poly',

max_iter=-1, probability=False, random_state=None, shrinking=True,

tol=0.001, verbose=False)

## 3) SVM with rbf kernel

#SVC with rbf kernel svc = svm.SVC(kernel='rbf') #Fit svc.fit(x_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,

decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',

max_iter=-1, probability=False, random_state=None, shrinking=True,

tol=0.001, verbose=False)

## 4) LinearSVC with linear kernel

#Linear SVC with Linear Kernel from sklearn.svm import LinearSVC clf = LinearSVC() clf.fit(X, y)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,

intercept_scaling=1, loss='squared_hinge', max_iter=1000,

multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,

verbose=0)

In this case, squared hinge loss function (as against hinge loss function) and l2 penalty are the major changes compared to the earlier three methods. This method is useful for when sample size is larger.

For more on Linear SVC, you may refer this manual.

## Hyperparameters

**C**

**Kernel**

**Class weight**

**Fit Intercept**

**decision_function_shape**

**gamma**

**degree**

#### Only when the kernel used is ‘poly’, 3 is the default option.

## Tuning Hyperparameters

Let us explore how to tune important hyperparemeters in SVM. We can use either grid search or randomized search methods.

In this example, I grid search for best hyperparameters.

#import GridseachCV from sklearn.model_selection import GridSearchCV from sklearn import svm #Instantiate svc = svm.SVC() #Grid param_grid = [ {'C': [1, 10, 100, 1000], 'kernel': ['linear']}, {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']}, ] #Gridsearch gridsearch = GridSearchCV(svc, param_grid) gridsearch.fit(x_train, y_train); #Get best hyperparameters gridsearch.best_params_

{'C': 100, 'gamma': 0.001, ,'kernel': 'rbf'}

Additionally, we can also use randomized search for finding the best parameters. Advantages of randomized search is that it is faster and also we can search the hyperparameters such as C over distribution of values.

## Summary

In this post, we have explored the basic concepts regarding SVM, advantages, disadvantages and example with Python. We have also learnt how to tune the hyperparameters to obtain better performance.