我从23,515行和3列的数据帧开始。我将数据70/30分为训练/测试。我正在使用e1071软件包中的SVM拟合分类模型,以预测变量MISSING。拟合模型后,我尝试在测试集中预测MISSING,但是出现以下错误:
> ftplh_svm <- svm(MISSING ~ V1+V2, data=train_vars, type="C-classification", kernel="linear")
> p <- predict(ftplh_svm, test_vars, type="class")
Error in predict.svm(object, ...) : test data does not match model !
我尝试按照另一篇文章中的建议从测试集中删除预测的类:
> p <- predict(ftplh_svm, test_vars[-3], type="class")
Error in predict.svm(object, ...) : test data does not match model !
我也尝试按照Brad的建议删除空级别,但是最终没有任何级别被删除,并且得到了相同的结果:
> train_vars$V1 <- droplevels(as.factor(train_vars$V1))
> train_vars$V2 <- droplevels(as.factor(train_vars$V2))
> train_vars$MISSING <- droplevels(as.factor(train_vars$MISSING))
> test_vars$V1 <- droplevels(as.factor(test_vars$V1))
> test_vars$V2 <- droplevels(as.factor(test_vars$V2))
> test_vars$MISSING <- droplevels(as.factor(test_vars$MISSING))
> ftplh_svm <- svm(MISSING ~ V1+V2, data=train_vars, type="C-classification", kernel="linear")
> p <- predict(ftplh_svm, test_vars, type="class")
Error in predict.svm(object, ...) : test data does not match model !
我的训练集和测试集的结构:
> str(train_vars)
'data.frame': 16395 obs. of 3 variables:
$ V1: Factor w/ 148 levels "AAC","AAL","AGP",..: 1 1 2 2 2 2 2 2 2 2 ...
$ V2 : Factor w/ 284 levels "6AR","AAC","AAL",..: 79 42 180 180 180 180 180 180 180 180 ...
$ MISSING : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
> str(test_vars)
'data.frame': 7129 obs. of 3 variables:
$ V1: Factor w/ 111 levels "AAC","AAL","AGP",..: 1 2 2 2 2 2 2 2 2 2 ...
$ V2 : Factor w/ 265 levels "AAC","AAL","ABZ",..: 225 169 169 169 169 169 169 169 169 169 ...
$ MISSING : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
测试以查看我的测试集中是否有新级别(我对每个变量都执行了此操作):
> train_lev <- levels(train_vars$V1)
> test_lev <- levels(test_vars$V1)
> # these levels only exist in the test set
> new_levels <- setdiff(test_lev,train_lev)
> new_levels
character(0)
> # how many observations is it?
> obs <- which(test_vars$V1 %in% new_levels)
> length(obs)
[1] 0