liquidSVM: Demo for R

Philipp Thomann & Ingo Steinwart

2019-08-02

We give a demonstration of the capabilities of liquidSVM from an R viewpoint. More detailed information can be found in the documentation and in the help (e.g. ?svm).

Disclaimer: liquidSVM and the R-bindings are in general quite stable and well tested by several people. However, use in production is at your own risk.

If you run into problems please check first the documentation for more details, or report the bug to the maintainer.

liquidSVM in one Minute

Installation

To install and load the library, start an R session and type:

install.packages("liquidSVM", repos="http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/software/R")
library(liquidSVM)

Least Squares Regression

To illustrate liquidSVM for least squares regression we first load an example data set:

reg <- liquidData('reg-1d')

Now reg$train contains the training data and reg$test the test data. Both have the labels in its first column Y and the feature is in column X1. To train on the data and select the best hyperparameters do: (an optional display=1 gives some information as the training progresses):

model <- svm(Y~., reg$train)

Now you can test the learned SVM with the test set:

result <- test(model, reg$test)
errors(result)
#> val_error 
#>   0.00634

We also can plot the traing data and the regression curve:

plot(reg$train$X1, reg$train$Y, ylim=c(-.2,.8), ylab='Y', xlab='X1', axes=T, pch='.', cex=2.5)
curve(predict(model, x), add=T, col='red', lwd=2)

Since reg already contains $train and $test you can also run the whole experiment in one line. Then the result is stored in model$last_result:

model <- svm(Y~., reg)
errors(model$last_result)[1]
#> val_error 
#>   0.00606

Multi-Class Classification

We begin by loading an example data set for multi-class classification:

banana <- liquidData('banana-mc')
banana
#> LiquidData "banana-mc" with 4000 train samples and 4000 test samples
#>   having 3 columns named Y,X1,X2
#>   target "Y" factor with 4 levels: 1 (1200 samples) 2 (1200 samples) 3 (800 samples) ...

Since banana$train$Y is a factor the following commands perform multi-class classification and plot the resulting decision boundaries:

model <- svm(Y~., banana$train)
mycol <- c('red', 'blue', 'cyan', 'green')
plot(banana$train$X1, banana$train$X2, pch=20, col=mycol[banana$train$Y], ylab='', xlab='', axes=F, lwd=0.25)
x <- seq(-1,1,.01)
z <- matrix(predict(model,expand.grid(x,x)),length(x))
contour(x,x,z, add=T, levels=1:4, col=1, lwd=3)

In this case errors(...) shows both the global miss-classification error as well the errors of the underlying binary tasks, for more details see Multiclass classification:

errors(test(model, banana$test))
#> result   1vs2   1vs3   1vs4   2vs3   2vs4   3vs4 
#> 0.2250 0.1375 0.1150 0.1040 0.0880 0.0695 0.0000

Cells

For larger data sets, liquidSVM provides efficient data chunking methods. A simple way to invoke these is:

reg <- liquidData('reg-1d-larger')
model <- svm(Y~., reg, useCells=TRUE)
errors(model$last_result)
#> val_error 
#>   0.00527

More information about the implemented data chunking methods can be found in the documentation.

First Steps

Installation

Download and installation can be done from a running R session by

install.packages("liquidSVM", repos="http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/software/R")
library(liquidSVM)

If you are running Linux and you have CUDA installed, e.g. in /usr/local/cuda, you can activate GPU support at installation time by

install.packages('liquidSVM', repos="http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/software/R")
library(liquidSVM), configure.args="native /usr/local/cuda")

Once the package is installed you can open the very vignette you are reading by issuing:

vignette('demo', package='liquidSVM')

After Installation you have to load the package

library(liquidSVM)

Training and Testing Models

To get started we consider two problems using built-in R datasets:

For the most simple out of the box training use svm(...). If nothing else is specified then the explanatory variable determines what kind of problem is getting addressed:

# least squares
 
modelTrees <- svm(Height ~ Girth + Volume, trees)

# multiclass classification

modelIris <- svm(Species ~ ., iris)  
#> Warning in (function (model, command.args = NULL, ..., d = NULL,
#> warn.suboptimal = getOption("liquidSVM.warn.suboptimal", : Solution may not
#> be optimal: try training again using max_gamma=25

This trains a model by performing 5-fold cross-validation on a grid of 10 bandwidth parameters gamma and 10 cost parameters lambda. The issued warning suggests that the considered grid for gamma may be sub-optimal. After training the best parameters from the grid are selected by default.

After training one can calculate predictions for the model as usual in R:

predict(modelTrees, trees[21:31, ])
#>  [1] 79.5 77.5 76.8 74.0 76.8 81.9 81.7 81.6 79.3 79.1 85.8
predict(modelIris, iris[3:8, ])
#> [1] setosa setosa setosa setosa setosa setosa
#> Levels: setosa versicolor virginica

Remak. The actual values that liquidSVM produces can vary since the random splits for cross validation may differ, and different CPU architectures use different code for finding approximate SVM solutions in the iterative SVM solvers. This holds for all the outputs in this document.

Note that training on the full iris data gives a perfect fit:

all.equal(predict(modelIris, iris[3:8, ]), iris$Species[3:8])
#> [1] TRUE

but of course the better approach to machine learning is to split the data set into a training and a test set. To illustrate this, we split the data set quakes, which describes 1000 seismic events on Fiji by 5 numerical values, in a training set of size 800 and a test set with 200 samples:

qu <- ttsplit(quakes, testSize = 200)

Then we learn to predict the magnitude of the events with the help of the training set:

model <- svm(mag ~ ., qu$train)

and assess how well we have learned with the help of the test set:

result <- test(model, qu$test)
errors(result)
#> val_error 
#>     0.033

Saving and Loading Models

SVM models can be saved and loaded for later re-use. Since SVM models require the training data set for making predictions, liquidSVM provides two ways for saving SVM models: a stand-alone model ‘.fsol’ that contains the training set and a lighter model ‘.sol’ that does not contain the training set. The following commands create a model and save it in both ways:

banana <- liquidData('banana-bc')
modelOrig <- mcSVM(Y~., banana$train)
#
write.liquidSVM(modelOrig, "banana-bc.fsol")
write.liquidSVM(modelOrig, "banana-bc.sol")

# Finally, we delete the SVM model

clean(modelOrig) 

Let us now read the stand-alone model from the file and compute the test error

modelRead <- read.liquidSVM("banana-bc.fsol")
errors(test(modelRead, banana$test))
#> [1] 0.144

To read the lighter model without the training data, we have add the training set manually:

banana <- liquidData('banana-bc')
modelDataExternal <- read.liquidSVM("banana-bc.sol", Y~., banana$train)
result <- test(modelDataExternal, banana$test)
errors(result)
#> [1] 0.144

SVM models can also be serialized. The following commands illustrate this:

banana <- liquidData('banana-bc')
modelOrig <- mcSVM(Y~., banana$train)

# Now, we serialize the model into a raw vector and delete the original model

obj <- serialize.liquidSVM(modelOrig)
clean(modelOrig) 

# Next we unserialize it from that raw vector and apply it to a test data set

modelUnserialized <- unserialize.liquidSVM(obj)
errors(test(modelUnserialized, banana$test))
#> [1] 0.141

Increasing Speed and Avoiding Memory Barrier: Cells

A major issue with SVMs is that for larger sample sizes the kernel matrix does not fit into the memory. Classically this gives an upper limit for the class of problems that traditional SVMs can handle without significant runtime increase. The concept of cells makes it possible to circumvent these issues.

If you specify useCells=TRUE or voronoi=..., see the documentation for details for the latter option, the input space \(X\) is partitioned into a number of cells. The training is independently done first for cell 1, then for cell 2, and so on. To predict the label for a value \(x\in X\) ,liquidSVM first determines the cell this \(x\) belongs to, and then uses the SVM of that cell to predict a label for it. Let us illustrate this:

banana <- liquidData('banana-bc')
banana
mycol <- c('red', 'blue')
plot(banana$train[,2:3], col=mycol[banana$train$Y], pch=20, ylab='', xlab='', axes=F, lwd=0.25)

# Train the model and plot the centers of the used cells 

model <- mcSVM(Y~.,banana$train, voronoi=c(4,500))
errors(test(model, banana$test))
centers <- getCover(model)$samples
points(centers, pch=19, cex=2, col='cyan')

# Plot the boundaries of the cells

library(deldir)
xranges <- c(range(banana$train$X1),range(banana$train$X2))
voronoi <- deldir(centers$X1, centers$X2, rw=xranges)
plot(voronoi, wlines="tess", add=TRUE, lty=1, lwd=3, wpoints='none')

Let us now illustrate how using cells can decrease the training and prediction time. To this end, we consider a larger variant of the banana-bc data set and train and test an SVM with 3 threads and no partitioning into cells:

banana <- liquidData('banana-bc-medium')
banana
#> LiquidData "banana-bc-medium" with 10000 train samples and 10000 test samples
#>   having 3 columns named Y,X1,X2
#>   target "Y" mean 0 range [-1,1]
system.time(model <- mcSVM(Y~., banana$train, threads=3))
#>    user  system elapsed 
#> 125.280   0.352  42.409
system.time(error <- errors(test(model, banana$test, threads=3)))
#>    user  system elapsed 
#>   5.752   0.008   1.956
error
#> val_error 
#>     0.146

For both operations, the user time is about 3 times the elapsed time, which is not surprising since we used 3 threads. Let us now use cells for the same problem.

banana <- liquidData('banana-bc-medium')
system.time(model <- mcSVM(Y~., banana$train, threads=3, useCells=TRUE))
#>    user  system elapsed 
#>  20.784   0.132   8.204
system.time(error <- errors(test(model, banana$test, threads=3)))
#>    user  system elapsed 
#>   2.276   0.032   0.810
error
#> val_error 
#>     0.147

Both training and prediction time is reduced by a factor between 4 and 5, while the test error is essentially unchanged. Let us finally consider a variant of the banana-bc data set, for which training without cells only works on computers with more than 26GB of free RAM:

banana <- liquidData('banana-bc-large')
banana
#> LiquidData "banana-bc-large" with 50000 train samples and 50000 test samples
#>   having 3 columns named Y,X1,X2
#>   target "Y" mean 0 range [-1,1]
system.time(model <- mcSVM(Y~., banana$train, threads=3, useCells=TRUE))
#> Warning in selectSVMs(model): Solution may not be optimal: try training
#> again using max_gamma=25
#>    user  system elapsed 
#>  106.16    0.54   41.71
system.time(error <- errors(test(model, banana$test, threads=3)))
#>    user  system elapsed 
#>   9.432   0.116   3.352
error
#> val_error 
#>     0.143

Note that the test error decreases with increasing training size, but the effect is not as strong as one may have hoped for. As suggested, one can increase the values for gamma, but on this data set this has only a minimal positive effect. This behavior is, however, not an issue of liquidSVM, but can be simply explained by the fact that even for the smallest version of banana-bc the test error is already close to the best possible test error. In other words, there is simply not much room for improvement when increasing the sample size.

CUDA

liquidSVM also is able to calculate the kernel on a GPU if it is compiled with CUDA-support. Since there is a large overhead in moving the kernel matrix from the GPU memory, this is most useful for problems with many dimensions. We take here the Gisette data set which takes the digits 4 and 9 from the standard MNIST data of handwritten digits and adds some new attributes to obtain 6000 samples and 5000 attributes.

First we load the data into a liquidSVM-model:

gi <- liquidData('gisette')
model <- init.liquidSVM(Y~.,gi$train)

Now we train the model with and without GPU to compare:

system.time(trainSVMs(model, gpus=1, threads=1))
#>   user  system elapsed
#>     57     10       67
system.time(trainSVMs(model, gpus=0, threads=4))
#>   user  system elapsed
#>    392       1     110

Note that with GPU support liquidSVM cannot use more threads than available GPUs. Consequently, there is no time gain when using 4 threads with a single GPU system:

system.time(trainSVMs(model, gpus=1, threads=4))
#>   user  system elapsed
#>     94      42      67

while without GPU support using only 1 thread substantially increases the training time compared to the call with 4 threads.

system.time(trainSVMs(model, gpus=0, threads=1))
#>   user  system elapsed
#>    327       1     329

A Quick Comparison to libsvm

We use e1071::svm, which is a binding for libsvm. You can install it from CRAN by using

install.packages("e1071")

liquidSVM’s out of the box behavior is to use 5 fold cross-validation:

banana <- liquidData('banana-bc')
system.time(liquid_svm_model <- svm(Y~., banana$train, threads=1))
#>    user  system elapsed 
#>   2.616   0.028   2.644

For fairness, we used a single thread, only. Let us now, run the same 5 fold cross validation in e1071::svm. To this end, we need to convert the parameter grid and call e1071::tune.svm:

folds <- 5
GAMMA <- 1/(liquid_svm_model$gammas)^2
COST <- 1/(2 * (folds-1)/folds * nrow(banana$train) * liquid_svm_model$lambdas)

system.time(e1071::tune.svm(Y~., data=banana$train, gamma=GAMMA, cost=COST, scale=F, e1071::tune.control(cross=folds)))
#>    user  system elapsed 
#>     222       0     222

For larger data sets the difference between both packages is even more pronounced.

Learning Scenarios

liquidSVM organizes its work into tasks: E.g. in multi-class classification the problem has to be reduced to several binary classification problems; and in quantile regression, the training phase simultaneously considers different quantiles. Now, behind the scenes svm(formula, data, ...) does the following:

model <- init.liquidSVM(formula, data)
trainSVMs(model, ...)
selectSVMs(model, ...)

Here, the additional parameters for trainSVMs and selectSVMs need to be manually chosen, to describe the different learning scenarios such as multi-class classification or quantile regression. Since this requires expertise and is in most cases cumbersome, the liquidSVM provides higher level functions for the following learning scenarios.

Multiclass classification

For SVMs multi-class classification has to be reduced to binary classification. There are two common strategies for this:

Then for any point in the test set, the winning label is chosen. A second, more subtle choice to make is whether the hinge or the least-square loss should be used for the binary classification problems.

To illustrate the resulting four possibilities, let us have a look at the example dataset banana-mc, which has 4 labels:

banana <- liquidData('banana-mc')
mycol <- c('red', 'blue', 'cyan', 'green')
plot(banana$train$X1, banana$train$X2, pch=20, col=mycol[banana$train$Y], ylab='', xlab='', axes=F, lwd=0.25)

Since there are 6 pairings, AvA trains 6 tasks, whereas OvA only trains 4 tasks. Let us first have a look at training with AvA:

banana <- liquidData('banana-mc')

model <- mcSVM(Y~., banana, mc_type="AvA_hinge")
errors(model$last_result)
#> result   1vs2   1vs3   1vs4   2vs3   2vs4   3vs4 
#> 0.2285 0.1371 0.1170 0.1020 0.0935 0.0730 0.0000
model$last_result[1:3,]
#>   result 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4
#> 1      3    1    3    1    3    2    3
#> 2      1    1    1    1    3    2    3
#> 3      3    1    3    1    3    2    3

model <- mcSVM(Y~., banana, mc_type="AvA_ls")
errors(model$last_result)
#> result   1vs2   1vs3   1vs4   2vs3   2vs4   3vs4 
#> 0.2240 0.1342 0.1150 0.1050 0.0900 0.0715 0.0000
model$last_result[1:3,]
#>   result    1vs2    1vs3   1vs4  2vs3   2vs4 3vs4
#> 1      3 -0.6331  0.8184 -0.995 0.996 -0.960   -1
#> 2      1 -0.8636 -0.0186 -0.994 0.901 -0.874   -1
#> 3      3 -0.0919  0.9745 -0.979 0.995 -0.968   -1

The first element in errors gives the overall test error. The other errors correspond to the 6 binary classification tasks. Similarly, result displays the final decision for a test sample in the first column, and the predictions of the 6 underlying binary classifications in the other columns. One can nicely see how the final prediction is based on a vote over the 6 binary classification tasks.

Let us now have a look at training with OvA:

banana <- liquidData('banana-mc')

model <- mcSVM(Y~., banana, mc_type="OvA_ls")
errors(model$last_result)
#>    result 1vsOthers 2vsOthers 3vsOthers 4vsOthers 
#>    0.2240    0.1565    0.1207    0.0863    0.0735
model$last_result[1:3,]
#>   result 1vsOthers 2vsOthers 3vsOthers 4vsOthers
#> 1      3    -0.820    -0.977    0.8220    -0.993
#> 2      1    -0.059    -0.967   -0.0806    -0.998
#> 3      3    -0.941    -0.929    0.8476    -0.992

model <- mcSVM(Y~., banana, mc_type="OvA_hinge")
errors(model$last_result)
#>    result 1vsOthers 2vsOthers 3vsOthers 4vsOthers 
#>    0.2268    0.1570    0.1250    0.0873    0.0742
model$last_result[1:3,]
#>   result 1vsOthers 2vsOthers 3vsOthers 4vsOthers
#> 1      3    -1.000    -0.996     1.000    -1.000
#> 2      3    -0.547    -0.909     0.186    -0.998
#> 3      3    -0.987    -0.989     1.000    -1.000

Again, the first element in errors gives the overall test error. The other errors correspond to the 4 binary classification tasks. Also result displays the final decision for a test sample in the first column, and the predictions of the 4 underlying binary classifications in the other columns. Again, one can nicely see how the final prediction is based on a vote over the 4 binary classification tasks.

NOTE. AvA is usually faster, since every binary SVM just trains on the data belonging to only two labels. On the other hand OvA_ls often gives better results at the cost of longer training time.

Important. Unlike the other 3 possibilities, OvA_hinge is not universally consistent. Consequently, it should only be used with particular care.

Quantile regression

This uses the quantile solver with pinball loss and creates a predictor of the \(\tau\)-quantile for every \(\tau\) provided. The following lines provide an example for quantile regression:

reg <- liquidData('reg-1d')
quantiles_list <- c(0.05, 0.1, 0.5, 0.9, 0.95)

model <- qtSVM(Y ~ ., reg$train, weights=quantiles_list)
result_qt <- test(model, reg$test)
errors(result_qt)
#> [1] 0.00687 0.01174 0.02768 0.01311 0.00835

# Finally, let us plot the results

I <- order(reg$test$X1)
par(mar=rep(.1, 4))
plot(Y~X1, reg$test[I,], ylim=c(-.2,.8), ylab='Y', xlab='X', axes=T, pch='.', cex=2.5)
for(i in 1:length(quantiles_list))
  lines(reg$test$X1[I], result_qt[I,i], col=i+1, lwd=2)
legend('bottomright', col=6:2, lwd=1, legend=quantiles_list[5:1])

In this plot you see 5 estimated conditional \(\tau\)-quantiles, where \(\tau\in\{0.05, 0.1, 0.5, 0.9, 0.95\}\), of the distribution of the label \(y\) given \(x\).

Expectile regression

This uses the expectile solver with weighted least squares loss and a predictor of the \(\tau\)-expectile for every \(\tau\) provided. Recall, that the 0.5-expectile is in fact is just the ordinary mean, and therefore expectiles generalize the mean in a similar way as quantiles generalize the median. The following lines provide an example for expectile regression:

reg <- liquidData('reg-1d')
expectiles_list <- c(0.05, 0.1, 0.5, 0.9, 0.95)

model <- exSVM(Y ~ ., reg$train, weights=expectiles_list)
result_ex <- test(model, reg$test)
errors(result_ex)
#> [1] 0.000986 0.001538 0.003329 0.002619 0.001867

# Finally, let us plot the results

I <- order(reg$test$X1)
par(mar=rep(.1,4))
plot(Y~X1, reg$test[I,], ylim=c(-.2,.8), ylab='Y', xlab='X', axes=T, pch='.', cex=2.5)
for(i in 1:length(expectiles_list))
  lines(reg$test$X1[I], result_ex[I,i], col=i+1, lwd=2)
legend('bottomright', col=6:2, lwd=1, legend=expectiles_list[5:1])

Neyman-Pearson-Learning

Neyman-Pearson-Learning (NPL) attempts classification under the constraint that the probability of false positives (Type-I error) is bound by a significance level alpha, which in liquidSVM is called the NPL-constraint. The following lines illustrate NPL for 5 different NPL-constraints:

banana <- liquidData('banana-bc-medium')
npl_constraints <- c(0.025, 0.033, 0.05, 0.075, 0.1)

# The option class=-1 specifies the normal class
model <- nplSVM(Y ~ ., banana, class=-1, constraint.factor=npl_constraints)
result_npl <- model$last_result
errors(result_npl)
#> [1] 0.448 0.377 0.295 0.244 0.244

false_alarm_rate <- apply(result_npl[banana$test$Y==-1,]==1, 2, mean)
detection_rate <- apply(result_npl[banana$test$Y==1,]==1, 2, mean)
rbind(npl_constraints,false_alarm_rate,detection_rate)
#>                      V1     V2     V3     V4     V5
#> npl_constraints  0.0250 0.0330 0.0500 0.0750 0.1000
#> false_alarm_rate 0.0216 0.0332 0.0604 0.0774 0.0774
#> detection_rate   0.5520 0.6232 0.7048 0.7558 0.7558

You can see that the false alarm rates on the test set approximately satisfies the NPL-constraints. In addition, the detection rate is increasing with increasing NPL-constraint.

ROC curve

Receiver Operating Characteristic (ROC) curves plot the trade-off between the false alarm rate and the detection rate for different weights (default is 9 weights). The following commands provide an example:

banana <- liquidData('banana-bc')

model <- rocSVM(Y ~ ., banana$train)
result_roc <- test(model, banana$test)

false_positive_rate <- apply(result_roc[banana$test$Y==-1,]==1,2,mean)
detection_rate <- apply(result_roc[banana$test$Y==1,]==1,2,mean)

plot(false_positive_rate, detection_rate, xlim=0:1, ylim=0:1, asp=1, type='b', pch='x')
abline(0, 1, lty=2)

Note that the plotted part of the ROC curve is near the north-west corner, which indicates a rather favorable learning behavior on this data set.

An alternative and typically quicker way to calculate an ROC curve is to use least-squares regression, which estimates the conditional probability at any point. For this we use the same method plotROC:

banana <- liquidData('banana-bc')

model.ls <- lsSVM(Y~., banana$train)

plotROC(model.ls, banana$test, xlim=0:1, ylim=0:1, asp=1, type='l')
points(false_positive_rate, detection_rate, pch='x', col='red')

And we see that both methods give almost the same results on this data set.

Calculating the Kernel Matrix

Sometimes, one needs to calculate the kernel matrix for other purposes. Since liquidSVM provides parallel support for computing the kernel matrix, we have added a simple function to access the corresponding routine:

banana <- liquidData("banana-bc")$train[1:100,-1]
a <- kern(banana)
a[1:4,1:4]
#>       [,1]  [,2]  [,3]  [,4]
#> [1,] 1.000 0.983 0.959 0.711
#> [2,] 0.983 1.000 0.893 0.612
#> [3,] 0.959 0.893 1.000 0.836
#> [4,] 0.711 0.612 0.836 1.000

image(liquidSVM::kern(banana, gamma=1.1, type="gauss"))

image(liquidSVM::kern(banana, gamma=1.1, type="poisson"))

liquidData

For convenience we provide several datasets prepared for training and testing.

http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/liquidData

They can be imported by name e.g. using:

liquidData('reg-1d')

This loads both reg-1d.train.csv as well as reg-1d.test.csv into reg$train and reg$test, respectively.

liquidData data sets have a strict format: they have comma-separated numerical values and no header. The first column is the label and gets the variable name Y. The other columns are the features and get variable names X1, X2,

You can also load part of the samples as in the following examples:

# take 10% of training and testing data
liquidData('reg-1d', prob=0.1)

# a sample of 400 train samples and the same relative size of test samples
liquidData('reg-1d', trainSize=400)

# a sample of 400 train samples and all test samples
liquidData('reg-1d', trainSize=400, testSize=Inf)

By default the sub-sampling is done stratified, if the target Y is a factor.

Before downloading these data sets from our website, liquidData first tries some directories in the filesystem (configured by the character vector of locations in the parameter loc=):

  1. the working directory getwd()
  2. in your home directory "~/liquidData". In Windows, ~ typically is C:\Users\username\Documents
  3. The webpage http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/liquidData

The data sets can be gzip-ped, which is recognized by the additional extension .gz, e.g. reg-1d.train.csv.gz and reg-1d.test.csv.gz

If you want to split any data.frame into train/test and have it in the same format as above, use ttsplit(...). You can write such a data set to your filesystem with the help of write.liquidData.