We give a demonstration of the capabilities of liquidSVM from an R viewpoint. More detailed information can be found in the documentation and in the help (e.g. ?svm
).
Disclaimer: liquidSVM and the R-bindings are in general quite stable and well tested by several people. However, use in production is at your own risk.
If you run into problems please check first the documentation for more details, or report the bug to the maintainer.
Installation
To install and load the library, start an R session and type:
install.packages("liquidSVM", repos="http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/software/R")
library(liquidSVM)
Least Squares Regression
To illustrate liquidSVM for least squares regression we first load an example data set:
reg <- liquidData('reg-1d')
Now reg$train
contains the training data and reg$test
the test data. Both have the labels in its first column Y
and the feature is in column X1
. To train on the data and select the best hyperparameters do: (an optional display=1
gives some information as the training progresses):
model <- svm(Y~., reg$train)
Now you can test the learned SVM with the test set:
result <- test(model, reg$test)
errors(result)
#> val_error
#> 0.00634
We also can plot the traing data and the regression curve:
plot(reg$train$X1, reg$train$Y, ylim=c(-.2,.8), ylab='Y', xlab='X1', axes=T, pch='.', cex=2.5)
curve(predict(model, x), add=T, col='red', lwd=2)
Since reg
already contains $train
and $test
you can also run the whole experiment in one line. Then the result is stored in model$last_result
:
model <- svm(Y~., reg)
errors(model$last_result)[1]
#> val_error
#> 0.00606
Multi-Class Classification
We begin by loading an example data set for multi-class classification:
banana <- liquidData('banana-mc')
banana
#> LiquidData "banana-mc" with 4000 train samples and 4000 test samples
#> having 3 columns named Y,X1,X2
#> target "Y" factor with 4 levels: 1 (1200 samples) 2 (1200 samples) 3 (800 samples) ...
Since banana$train$Y
is a factor the following commands perform multi-class classification and plot the resulting decision boundaries:
model <- svm(Y~., banana$train)
mycol <- c('red', 'blue', 'cyan', 'green')
plot(banana$train$X1, banana$train$X2, pch=20, col=mycol[banana$train$Y], ylab='', xlab='', axes=F, lwd=0.25)
x <- seq(-1,1,.01)
z <- matrix(predict(model,expand.grid(x,x)),length(x))
contour(x,x,z, add=T, levels=1:4, col=1, lwd=3)
In this case errors(...)
shows both the global miss-classification error as well the errors of the underlying binary tasks, for more details see Multiclass classification:
errors(test(model, banana$test))
#> result 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4
#> 0.2250 0.1375 0.1150 0.1040 0.0880 0.0695 0.0000
Cells
For larger data sets, liquidSVM provides efficient data chunking methods. A simple way to invoke these is:
reg <- liquidData('reg-1d-larger')
model <- svm(Y~., reg, useCells=TRUE)
errors(model$last_result)
#> val_error
#> 0.00527
More information about the implemented data chunking methods can be found in the documentation.
Download and installation can be done from a running R session by
install.packages("liquidSVM", repos="http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/software/R")
library(liquidSVM)
If you are running Linux and you have CUDA installed, e.g. in /usr/local/cuda
, you can activate GPU support at installation time by
install.packages('liquidSVM', repos="http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/software/R")
library(liquidSVM), configure.args="native /usr/local/cuda")
Once the package is installed you can open the very vignette you are reading by issuing:
vignette('demo', package='liquidSVM')
After Installation you have to load the package
library(liquidSVM)
To get started we consider two problems using built-in R datasets:
Regression with the trees
data set (31 samples, 3 variables) and we use the formula
Height ~ Girth + Volume
Height
of trees should be explained by both their Girth
and Volume
.Multi-classification with the iris
data set (150 samples, 5 variables):
Species ~ .
The factor Species
has 3 levels setosa, versicolor
, and virginica
. Since we are explaining a factor this yields by default a multi-classification problem. Note that the dot on the right hand side means that all the other variables of iris
are to be used to explain. Hence this is equivalent to:
Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
For the most simple out of the box training use svm(...)
. If nothing else is specified then the explanatory variable determines what kind of problem is getting addressed:
# least squares
modelTrees <- svm(Height ~ Girth + Volume, trees)
# multiclass classification
modelIris <- svm(Species ~ ., iris)
#> Warning in (function (model, command.args = NULL, ..., d = NULL,
#> warn.suboptimal = getOption("liquidSVM.warn.suboptimal", : Solution may not
#> be optimal: try training again using max_gamma=25
This trains a model by performing 5-fold cross-validation on a grid of 10 bandwidth parameters gamma and 10 cost parameters lambda. The issued warning suggests that the considered grid for gamma may be sub-optimal. After training the best parameters from the grid are selected by default.
After training one can calculate predictions for the model as usual in R
:
predict(modelTrees, trees[21:31, ])
#> [1] 79.5 77.5 76.8 74.0 76.8 81.9 81.7 81.6 79.3 79.1 85.8
predict(modelIris, iris[3:8, ])
#> [1] setosa setosa setosa setosa setosa setosa
#> Levels: setosa versicolor virginica
Remak. The actual values that liquidSVM produces can vary since the random splits for cross validation may differ, and different CPU architectures use different code for finding approximate SVM solutions in the iterative SVM solvers. This holds for all the outputs in this document.
Note that training on the full iris data gives a perfect fit:
all.equal(predict(modelIris, iris[3:8, ]), iris$Species[3:8])
#> [1] TRUE
but of course the better approach to machine learning is to split the data set into a training and a test set. To illustrate this, we split the data set quakes
, which describes 1000 seismic events on Fiji by 5 numerical values, in a training set of size 800 and a test set with 200 samples:
qu <- ttsplit(quakes, testSize = 200)
Then we learn to predict the magnitude of the events with the help of the training set:
model <- svm(mag ~ ., qu$train)
and assess how well we have learned with the help of the test set:
result <- test(model, qu$test)
errors(result)
#> val_error
#> 0.033
SVM models can be saved and loaded for later re-use. Since SVM models require the training data set for making predictions, liquidSVM provides two ways for saving SVM models: a stand-alone model ‘.fsol’ that contains the training set and a lighter model ‘.sol’ that does not contain the training set. The following commands create a model and save it in both ways:
banana <- liquidData('banana-bc')
modelOrig <- mcSVM(Y~., banana$train)
#
write.liquidSVM(modelOrig, "banana-bc.fsol")
write.liquidSVM(modelOrig, "banana-bc.sol")
# Finally, we delete the SVM model
clean(modelOrig)
Let us now read the stand-alone model from the file and compute the test error
modelRead <- read.liquidSVM("banana-bc.fsol")
errors(test(modelRead, banana$test))
#> [1] 0.144
To read the lighter model without the training data, we have add the training set manually:
banana <- liquidData('banana-bc')
modelDataExternal <- read.liquidSVM("banana-bc.sol", Y~., banana$train)
result <- test(modelDataExternal, banana$test)
errors(result)
#> [1] 0.144
SVM models can also be serialized. The following commands illustrate this:
banana <- liquidData('banana-bc')
modelOrig <- mcSVM(Y~., banana$train)
# Now, we serialize the model into a raw vector and delete the original model
obj <- serialize.liquidSVM(modelOrig)
clean(modelOrig)
# Next we unserialize it from that raw vector and apply it to a test data set
modelUnserialized <- unserialize.liquidSVM(obj)
errors(test(modelUnserialized, banana$test))
#> [1] 0.141
A major issue with SVMs is that for larger sample sizes the kernel matrix does not fit into the memory. Classically this gives an upper limit for the class of problems that traditional SVMs can handle without significant runtime increase. The concept of cells makes it possible to circumvent these issues.
If you specify useCells=TRUE
or voronoi=...
, see the documentation for details for the latter option, the input space \(X\) is partitioned into a number of cells. The training is independently done first for cell 1, then for cell 2, and so on. To predict the label for a value \(x\in X\) ,liquidSVM first determines the cell this \(x\) belongs to, and then uses the SVM of that cell to predict a label for it. Let us illustrate this:
banana <- liquidData('banana-bc')
banana
mycol <- c('red', 'blue')
plot(banana$train[,2:3], col=mycol[banana$train$Y], pch=20, ylab='', xlab='', axes=F, lwd=0.25)
# Train the model and plot the centers of the used cells
model <- mcSVM(Y~.,banana$train, voronoi=c(4,500))
errors(test(model, banana$test))
centers <- getCover(model)$samples
points(centers, pch=19, cex=2, col='cyan')
# Plot the boundaries of the cells
library(deldir)
xranges <- c(range(banana$train$X1),range(banana$train$X2))
voronoi <- deldir(centers$X1, centers$X2, rw=xranges)
plot(voronoi, wlines="tess", add=TRUE, lty=1, lwd=3, wpoints='none')
Let us now illustrate how using cells can decrease the training and prediction time. To this end, we consider a larger variant of the banana-bc
data set and train and test an SVM with 3 threads and no partitioning into cells:
banana <- liquidData('banana-bc-medium')
banana
#> LiquidData "banana-bc-medium" with 10000 train samples and 10000 test samples
#> having 3 columns named Y,X1,X2
#> target "Y" mean 0 range [-1,1]
system.time(model <- mcSVM(Y~., banana$train, threads=3))
#> user system elapsed
#> 125.280 0.352 42.409
system.time(error <- errors(test(model, banana$test, threads=3)))
#> user system elapsed
#> 5.752 0.008 1.956
error
#> val_error
#> 0.146
For both operations, the user time is about 3 times the elapsed time, which is not surprising since we used 3 threads. Let us now use cells for the same problem.
banana <- liquidData('banana-bc-medium')
system.time(model <- mcSVM(Y~., banana$train, threads=3, useCells=TRUE))
#> user system elapsed
#> 20.784 0.132 8.204
system.time(error <- errors(test(model, banana$test, threads=3)))
#> user system elapsed
#> 2.276 0.032 0.810
error
#> val_error
#> 0.147
Both training and prediction time is reduced by a factor between 4 and 5, while the test error is essentially unchanged. Let us finally consider a variant of the banana-bc
data set, for which training without cells only works on computers with more than 26GB of free RAM:
banana <- liquidData('banana-bc-large')
banana
#> LiquidData "banana-bc-large" with 50000 train samples and 50000 test samples
#> having 3 columns named Y,X1,X2
#> target "Y" mean 0 range [-1,1]
system.time(model <- mcSVM(Y~., banana$train, threads=3, useCells=TRUE))
#> Warning in selectSVMs(model): Solution may not be optimal: try training
#> again using max_gamma=25
#> user system elapsed
#> 106.16 0.54 41.71
system.time(error <- errors(test(model, banana$test, threads=3)))
#> user system elapsed
#> 9.432 0.116 3.352
error
#> val_error
#> 0.143
Note that the test error decreases with increasing training size, but the effect is not as strong as one may have hoped for. As suggested, one can increase the values for gamma, but on this data set this has only a minimal positive effect. This behavior is, however, not an issue of liquidSVM, but can be simply explained by the fact that even for the smallest version of banana-bc
the test error is already close to the best possible test error. In other words, there is simply not much room for improvement when increasing the sample size.
liquidSVM also is able to calculate the kernel on a GPU if it is compiled with CUDA-support. Since there is a large overhead in moving the kernel matrix from the GPU memory, this is most useful for problems with many dimensions. We take here the Gisette data set which takes the digits 4
and 9
from the standard MNIST data of handwritten digits and adds some new attributes to obtain 6000 samples and 5000 attributes.
First we load the data into a liquidSVM-model:
gi <- liquidData('gisette')
model <- init.liquidSVM(Y~.,gi$train)
Now we train the model with and without GPU to compare:
system.time(trainSVMs(model, gpus=1, threads=1))
#> user system elapsed
#> 57 10 67
system.time(trainSVMs(model, gpus=0, threads=4))
#> user system elapsed
#> 392 1 110
Note that with GPU support liquidSVM cannot use more threads than available GPUs. Consequently, there is no time gain when using 4 threads with a single GPU system:
system.time(trainSVMs(model, gpus=1, threads=4))
#> user system elapsed
#> 94 42 67
while without GPU support using only 1 thread substantially increases the training time compared to the call with 4 threads.
system.time(trainSVMs(model, gpus=0, threads=1))
#> user system elapsed
#> 327 1 329
libsvm
We use e1071::svm
, which is a binding for libsvm
. You can install it from CRAN by using
install.packages("e1071")
liquidSVM’s out of the box behavior is to use 5 fold cross-validation:
banana <- liquidData('banana-bc')
system.time(liquid_svm_model <- svm(Y~., banana$train, threads=1))
#> user system elapsed
#> 2.616 0.028 2.644
For fairness, we used a single thread, only. Let us now, run the same 5 fold cross validation in e1071::svm
. To this end, we need to convert the parameter grid and call e1071::tune.svm
:
folds <- 5
GAMMA <- 1/(liquid_svm_model$gammas)^2
COST <- 1/(2 * (folds-1)/folds * nrow(banana$train) * liquid_svm_model$lambdas)
system.time(e1071::tune.svm(Y~., data=banana$train, gamma=GAMMA, cost=COST, scale=F, e1071::tune.control(cross=folds)))
#> user system elapsed
#> 222 0 222
For larger data sets the difference between both packages is even more pronounced.
liquidSVM organizes its work into tasks: E.g. in multi-class classification the problem has to be reduced to several binary classification problems; and in quantile regression, the training phase simultaneously considers different quantiles. Now, behind the scenes svm(formula, data, ...)
does the following:
model <- init.liquidSVM(formula, data)
trainSVMs(model, ...)
selectSVMs(model, ...)
Here, the additional parameters for trainSVMs
and selectSVMs
need to be manually chosen, to describe the different learning scenarios such as multi-class classification or quantile regression. Since this requires expertise and is in most cases cumbersome, the liquidSVM provides higher level functions for the following learning scenarios.
For SVMs multi-class classification has to be reduced to binary classification. There are two common strategies for this:
Then for any point in the test set, the winning label is chosen. A second, more subtle choice to make is whether the hinge or the least-square loss should be used for the binary classification problems.
To illustrate the resulting four possibilities, let us have a look at the example dataset banana-mc
, which has 4 labels:
banana <- liquidData('banana-mc')
mycol <- c('red', 'blue', 'cyan', 'green')
plot(banana$train$X1, banana$train$X2, pch=20, col=mycol[banana$train$Y], ylab='', xlab='', axes=F, lwd=0.25)
Since there are 6 pairings, AvA
trains 6 tasks, whereas OvA
only trains 4 tasks. Let us first have a look at training with AvA:
banana <- liquidData('banana-mc')
model <- mcSVM(Y~., banana, mc_type="AvA_hinge")
errors(model$last_result)
#> result 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4
#> 0.2285 0.1371 0.1170 0.1020 0.0935 0.0730 0.0000
model$last_result[1:3,]
#> result 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4
#> 1 3 1 3 1 3 2 3
#> 2 1 1 1 1 3 2 3
#> 3 3 1 3 1 3 2 3
model <- mcSVM(Y~., banana, mc_type="AvA_ls")
errors(model$last_result)
#> result 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4
#> 0.2240 0.1342 0.1150 0.1050 0.0900 0.0715 0.0000
model$last_result[1:3,]
#> result 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4
#> 1 3 -0.6331 0.8184 -0.995 0.996 -0.960 -1
#> 2 1 -0.8636 -0.0186 -0.994 0.901 -0.874 -1
#> 3 3 -0.0919 0.9745 -0.979 0.995 -0.968 -1
The first element in errors
gives the overall test error. The other errors correspond to the 6 binary classification tasks. Similarly, result
displays the final decision for a test sample in the first column, and the predictions of the 6 underlying binary classifications in the other columns. One can nicely see how the final prediction is based on a vote over the 6 binary classification tasks.
Let us now have a look at training with OvA:
banana <- liquidData('banana-mc')
model <- mcSVM(Y~., banana, mc_type="OvA_ls")
errors(model$last_result)
#> result 1vsOthers 2vsOthers 3vsOthers 4vsOthers
#> 0.2240 0.1565 0.1207 0.0863 0.0735
model$last_result[1:3,]
#> result 1vsOthers 2vsOthers 3vsOthers 4vsOthers
#> 1 3 -0.820 -0.977 0.8220 -0.993
#> 2 1 -0.059 -0.967 -0.0806 -0.998
#> 3 3 -0.941 -0.929 0.8476 -0.992
model <- mcSVM(Y~., banana, mc_type="OvA_hinge")
errors(model$last_result)
#> result 1vsOthers 2vsOthers 3vsOthers 4vsOthers
#> 0.2268 0.1570 0.1250 0.0873 0.0742
model$last_result[1:3,]
#> result 1vsOthers 2vsOthers 3vsOthers 4vsOthers
#> 1 3 -1.000 -0.996 1.000 -1.000
#> 2 3 -0.547 -0.909 0.186 -0.998
#> 3 3 -0.987 -0.989 1.000 -1.000
Again, the first element in errors
gives the overall test error. The other errors correspond to the 4 binary classification tasks. Also result
displays the final decision for a test sample in the first column, and the predictions of the 4 underlying binary classifications in the other columns. Again, one can nicely see how the final prediction is based on a vote over the 4 binary classification tasks.
NOTE.
AvA
is usually faster, since every binary SVM just trains on the data belonging to only two labels. On the other handOvA_ls
often gives better results at the cost of longer training time.Important. Unlike the other 3 possibilities, OvA_hinge is not universally consistent. Consequently, it should only be used with particular care.
This uses the quantile solver with pinball loss and creates a predictor of the \(\tau\)-quantile for every \(\tau\) provided. The following lines provide an example for quantile regression:
reg <- liquidData('reg-1d')
quantiles_list <- c(0.05, 0.1, 0.5, 0.9, 0.95)
model <- qtSVM(Y ~ ., reg$train, weights=quantiles_list)
result_qt <- test(model, reg$test)
errors(result_qt)
#> [1] 0.00687 0.01174 0.02768 0.01311 0.00835
# Finally, let us plot the results
I <- order(reg$test$X1)
par(mar=rep(.1, 4))
plot(Y~X1, reg$test[I,], ylim=c(-.2,.8), ylab='Y', xlab='X', axes=T, pch='.', cex=2.5)
for(i in 1:length(quantiles_list))
lines(reg$test$X1[I], result_qt[I,i], col=i+1, lwd=2)
legend('bottomright', col=6:2, lwd=1, legend=quantiles_list[5:1])
In this plot you see 5 estimated conditional \(\tau\)-quantiles, where \(\tau\in\{0.05, 0.1, 0.5, 0.9, 0.95\}\), of the distribution of the label \(y\) given \(x\).
This uses the expectile solver with weighted least squares loss and a predictor of the \(\tau\)-expectile for every \(\tau\) provided. Recall, that the 0.5-expectile is in fact is just the ordinary mean, and therefore expectiles generalize the mean in a similar way as quantiles generalize the median. The following lines provide an example for expectile regression:
reg <- liquidData('reg-1d')
expectiles_list <- c(0.05, 0.1, 0.5, 0.9, 0.95)
model <- exSVM(Y ~ ., reg$train, weights=expectiles_list)
result_ex <- test(model, reg$test)
errors(result_ex)
#> [1] 0.000986 0.001538 0.003329 0.002619 0.001867
# Finally, let us plot the results
I <- order(reg$test$X1)
par(mar=rep(.1,4))
plot(Y~X1, reg$test[I,], ylim=c(-.2,.8), ylab='Y', xlab='X', axes=T, pch='.', cex=2.5)
for(i in 1:length(expectiles_list))
lines(reg$test$X1[I], result_ex[I,i], col=i+1, lwd=2)
legend('bottomright', col=6:2, lwd=1, legend=expectiles_list[5:1])
Neyman-Pearson-Learning (NPL) attempts classification under the constraint that the probability of false positives (Type-I error) is bound by a significance level alpha, which in liquidSVM is called the NPL-constraint. The following lines illustrate NPL for 5 different NPL-constraints:
banana <- liquidData('banana-bc-medium')
npl_constraints <- c(0.025, 0.033, 0.05, 0.075, 0.1)
# The option class=-1 specifies the normal class
model <- nplSVM(Y ~ ., banana, class=-1, constraint.factor=npl_constraints)
result_npl <- model$last_result
errors(result_npl)
#> [1] 0.448 0.377 0.295 0.244 0.244
false_alarm_rate <- apply(result_npl[banana$test$Y==-1,]==1, 2, mean)
detection_rate <- apply(result_npl[banana$test$Y==1,]==1, 2, mean)
rbind(npl_constraints,false_alarm_rate,detection_rate)
#> V1 V2 V3 V4 V5
#> npl_constraints 0.0250 0.0330 0.0500 0.0750 0.1000
#> false_alarm_rate 0.0216 0.0332 0.0604 0.0774 0.0774
#> detection_rate 0.5520 0.6232 0.7048 0.7558 0.7558
You can see that the false alarm rates on the test set approximately satisfies the NPL-constraints. In addition, the detection rate is increasing with increasing NPL-constraint.
Receiver Operating Characteristic (ROC) curves plot the trade-off between the false alarm rate and the detection rate for different weights (default is 9 weights). The following commands provide an example:
banana <- liquidData('banana-bc')
model <- rocSVM(Y ~ ., banana$train)
result_roc <- test(model, banana$test)
false_positive_rate <- apply(result_roc[banana$test$Y==-1,]==1,2,mean)
detection_rate <- apply(result_roc[banana$test$Y==1,]==1,2,mean)
plot(false_positive_rate, detection_rate, xlim=0:1, ylim=0:1, asp=1, type='b', pch='x')
abline(0, 1, lty=2)
Note that the plotted part of the ROC curve is near the north-west corner, which indicates a rather favorable learning behavior on this data set.
An alternative and typically quicker way to calculate an ROC curve is to use least-squares regression, which estimates the conditional probability at any point. For this we use the same method plotROC
:
banana <- liquidData('banana-bc')
model.ls <- lsSVM(Y~., banana$train)
plotROC(model.ls, banana$test, xlim=0:1, ylim=0:1, asp=1, type='l')
points(false_positive_rate, detection_rate, pch='x', col='red')
And we see that both methods give almost the same results on this data set.
Sometimes, one needs to calculate the kernel matrix for other purposes. Since liquidSVM provides parallel support for computing the kernel matrix, we have added a simple function to access the corresponding routine:
banana <- liquidData("banana-bc")$train[1:100,-1]
a <- kern(banana)
a[1:4,1:4]
#> [,1] [,2] [,3] [,4]
#> [1,] 1.000 0.983 0.959 0.711
#> [2,] 0.983 1.000 0.893 0.612
#> [3,] 0.959 0.893 1.000 0.836
#> [4,] 0.711 0.612 0.836 1.000
image(liquidSVM::kern(banana, gamma=1.1, type="gauss"))
image(liquidSVM::kern(banana, gamma=1.1, type="poisson"))
For convenience we provide several datasets prepared for training and testing.
http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/liquidData
They can be imported by name e.g. using:
liquidData('reg-1d')
This loads both reg-1d.train.csv
as well as reg-1d.test.csv
into reg$train
and reg$test
, respectively.
liquidData data sets have a strict format: they have comma-separated numerical values and no header. The first column is the label and gets the variable name Y
. The other columns are the features and get variable names X1, X2,
…
You can also load part of the samples as in the following examples:
# take 10% of training and testing data
liquidData('reg-1d', prob=0.1)
# a sample of 400 train samples and the same relative size of test samples
liquidData('reg-1d', trainSize=400)
# a sample of 400 train samples and all test samples
liquidData('reg-1d', trainSize=400, testSize=Inf)
By default the sub-sampling is done stratified, if the target Y
is a factor.
Before downloading these data sets from our website, liquidData
first tries some directories in the filesystem (configured by the character vector of locations in the parameter loc=
):
getwd()
"~/liquidData"
. In Windows, ~
typically is C:\Users\username\Documents
http://pnp.mathematik.uni-stuttgart.de/isa/steinwart/liquidData
The data sets can be gzip-ped, which is recognized by the additional extension .gz
, e.g. reg-1d.train.csv.gz
and reg-1d.test.csv.gz
If you want to split any data.frame
into train/test and have it in the same format as above, use ttsplit(...)
. You can write such a data set to your filesystem with the help of write.liquidData
.