R-无法用randomForest预测

时间:2015-07-22 09:10:05

标签: r random-forest predict

我有一个包含14个变量(13个预测和1个响应)的训练集和一个包含13个变量的测试集(13个预测,与训练集中的变量完全相同)。以下是详细信息:

> str(kaggletrain)
'data.frame':   1861 obs. of  14 variables:
 $ condit        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ condition     : Factor w/ 6 levels "For parts or not working",..: 6 6 6 4 5 6 3 3 6 6 ...
 $ good          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ ipad          : num  1 0 0 0 0 0 0 0 0 0 ...
 $ new           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ scratch       : num  0 1 0 0 0 0 0 0 0 0 ...
 $ screen        : num  0 1 0 0 0 0 0 0 0 0 ...
 $ this          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ use           : num  0 1 0 0 0 0 0 0 0 0 ...
 $ work          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ sold          : Factor w/ 2 levels "0","1": 1 2 2 1 1 2 2 1 2 2 ...
 $ WordCount     : int  45 100 0 0 100 0 0 0 0 0 ...
 $ biddable      : Factor w/ 2 levels "0","1": 1 2 1 1 1 2 2 1 2 2 ...
 $ productionline: Factor w/ 12 levels "iPad 1","iPad 2",..: 2 2 4 9 12 9 8 10 1 4 ...
> str(kaggletest)
'data.frame':   798 obs. of  13 variables:
 $ condit        : num  0 0 0 0 1 0 0 0 0 0 ...
 $ condition     : Factor w/ 6 levels "For parts or not working",..: 6 6 6 6 2 4 5 6 6 6 ...
 $ good          : num  0 0 1 0 0 0 1 0 0 0 ...
 $ ipad          : num  0 1 1 0 1 1 0 1 0 0 ...
 $ new           : num  1 0 0 0 0 1 0 0 0 0 ...
 $ scratch       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ screen        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ this          : num  0 0 1 0 0 0 0 0 1 0 ...
 $ use           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ work          : num  0 0 1 0 1 0 0 0 1 0 ...
 $ WordCount     : int  8 106 99 0 99 88 33 25 101 0 ...
 $ biddable      : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 2 ...
 $ productionline: Factor w/ 10 levels "iPad 1","iPad 2",..: 1 8 3 7 3 6 1 4 3 1 ...

我在训练集上构建了一个randomForest模型,它运行正常。但是,当我继续在测试集上使用predict时,我遇到了这个错误:

> bestrf = randomForest(sold~.,data=kaggletrain,mtry=3,ntree=400)
> pred.rf = predict(bestrf,kaggletest)
Error in predict.randomForest(bestrf, kaggletest) : 
  Type of predictors in new data do not match that of the training data.

如您所见,预测变量的名称在训练和测试集之间完全相同。有1个预测变量,在训练集中有12个级别,在测试集中有10个级别(我认为这应该没问题)。我在此数据集上使用predict和物流回归模型,它也有效。所以我不确定我做错了什么。非常感谢任何帮助。

0 个答案:

没有答案