R SVM Predict-Forecast.svm中的错误:测试数据与模型不匹配

时间:2019-07-24 14:32:25

标签: r svm

我从23,515行和3列的数据帧开始。我将数据70/30分为训练/测试。我正在使用e1071软件包中的SVM拟合分类模型,以预测变量MISSING。拟合模型后,我尝试在测试集中预测MISSING,但是出现以下错误:

> ftplh_svm <- svm(MISSING ~ V1+V2, data=train_vars, type="C-classification", kernel="linear")
> p <- predict(ftplh_svm, test_vars, type="class")
Error in predict.svm(object, ...) : test data does not match model !

我尝试按照另一篇文章中的建议从测试集中删除预测的类:

> p <- predict(ftplh_svm, test_vars[-3], type="class")
Error in predict.svm(object, ...) : test data does not match model !

我也尝试按照Brad的建议删除空级别,但是最终没有任何级别被删除,并且得到了相同的结果:

> train_vars$V1 <- droplevels(as.factor(train_vars$V1))
> train_vars$V2 <- droplevels(as.factor(train_vars$V2))
> train_vars$MISSING <- droplevels(as.factor(train_vars$MISSING))
> test_vars$V1 <- droplevels(as.factor(test_vars$V1))
> test_vars$V2 <- droplevels(as.factor(test_vars$V2))
> test_vars$MISSING <- droplevels(as.factor(test_vars$MISSING))
> ftplh_svm <- svm(MISSING ~ V1+V2, data=train_vars, type="C-classification", kernel="linear")
> p <- predict(ftplh_svm, test_vars, type="class")
Error in predict.svm(object, ...) : test data does not match model !

我的训练集和测试集的结构:

> str(train_vars)
'data.frame':   16395 obs. of  3 variables:
 $ V1: Factor w/ 148 levels "AAC","AAL","AGP",..: 1 1 2 2 2 2 2 2 2 2 ...
 $ V2  : Factor w/ 284 levels "6AR","AAC","AAL",..: 79 42 180 180 180 180 180 180 180 180 ...
 $ MISSING      : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
> str(test_vars)
'data.frame':   7129 obs. of  3 variables:
 $ V1: Factor w/ 111 levels "AAC","AAL","AGP",..: 1 2 2 2 2 2 2 2 2 2 ...
 $ V2  : Factor w/ 265 levels "AAC","AAL","ABZ",..: 225 169 169 169 169 169 169 169 169 169 ...
 $ MISSING      : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...

测试以查看我的测试集中是否有新级别(我对每个变量都执行了此操作):

> train_lev <- levels(train_vars$V1)
> test_lev <- levels(test_vars$V1)
> # these levels only exist in the test set
> new_levels <- setdiff(test_lev,train_lev)
> new_levels
character(0)
> # how many observations is it?
> obs <- which(test_vars$V1 %in% new_levels)
> length(obs)
[1] 0

0 个答案:

没有答案