我有一个包含14个变量(13个预测和1个响应)的训练集和一个包含13个变量的测试集(13个预测,与训练集中的变量完全相同)。以下是详细信息:
> str(kaggletrain)
'data.frame': 1861 obs. of 14 variables:
$ condit : num 0 0 0 0 0 0 0 0 0 0 ...
$ condition : Factor w/ 6 levels "For parts or not working",..: 6 6 6 4 5 6 3 3 6 6 ...
$ good : num 0 0 0 0 0 0 0 0 0 0 ...
$ ipad : num 1 0 0 0 0 0 0 0 0 0 ...
$ new : num 0 0 0 0 0 0 0 0 0 0 ...
$ scratch : num 0 1 0 0 0 0 0 0 0 0 ...
$ screen : num 0 1 0 0 0 0 0 0 0 0 ...
$ this : num 0 0 0 0 0 0 0 0 0 0 ...
$ use : num 0 1 0 0 0 0 0 0 0 0 ...
$ work : num 0 0 0 0 0 0 0 0 0 0 ...
$ sold : Factor w/ 2 levels "0","1": 1 2 2 1 1 2 2 1 2 2 ...
$ WordCount : int 45 100 0 0 100 0 0 0 0 0 ...
$ biddable : Factor w/ 2 levels "0","1": 1 2 1 1 1 2 2 1 2 2 ...
$ productionline: Factor w/ 12 levels "iPad 1","iPad 2",..: 2 2 4 9 12 9 8 10 1 4 ...
> str(kaggletest)
'data.frame': 798 obs. of 13 variables:
$ condit : num 0 0 0 0 1 0 0 0 0 0 ...
$ condition : Factor w/ 6 levels "For parts or not working",..: 6 6 6 6 2 4 5 6 6 6 ...
$ good : num 0 0 1 0 0 0 1 0 0 0 ...
$ ipad : num 0 1 1 0 1 1 0 1 0 0 ...
$ new : num 1 0 0 0 0 1 0 0 0 0 ...
$ scratch : num 0 0 0 0 0 0 0 0 0 0 ...
$ screen : num 0 0 0 0 0 0 0 0 0 0 ...
$ this : num 0 0 1 0 0 0 0 0 1 0 ...
$ use : num 0 0 0 0 0 0 0 0 0 0 ...
$ work : num 0 0 1 0 1 0 0 0 1 0 ...
$ WordCount : int 8 106 99 0 99 88 33 25 101 0 ...
$ biddable : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 2 ...
$ productionline: Factor w/ 10 levels "iPad 1","iPad 2",..: 1 8 3 7 3 6 1 4 3 1 ...
我在训练集上构建了一个randomForest
模型,它运行正常。但是,当我继续在测试集上使用predict
时,我遇到了这个错误:
> bestrf = randomForest(sold~.,data=kaggletrain,mtry=3,ntree=400)
> pred.rf = predict(bestrf,kaggletest)
Error in predict.randomForest(bestrf, kaggletest) :
Type of predictors in new data do not match that of the training data.
如您所见,预测变量的名称在训练和测试集之间完全相同。有1个预测变量,在训练集中有12个级别,在测试集中有10个级别(我认为这应该没问题)。我在此数据集上使用predict
和物流回归模型,它也有效。所以我不确定我做错了什么。非常感谢任何帮助。