Question

我使用随机森林构建了一个模型，并尝试使用预测（）在另一个数据库上对其进行测试。但是，它仅返回NA。

RF=randomForest(intention~., data=train,ntree=1000,na.action=na.roughfix) 
#no NA in the train nor the test dataset

# Predicting
pred <-predict(RF, newdata=test,type="response")
#pred vector is only set to NA

我检查了此页面，并检查了我的数据集没有NA。但是我继续得到同样的回报。 https://www.kaggle.com/c/the-analytics-edge-mit-15-071x/discussion/7808

我也检查了此页面，但对于Random Forest来说似乎不准确（或者我不理解）。 r - loess prediction returns NA

感谢您的帮助！

Answer 1

正如@Allan Cameron猜测的那样，问题出在数据集的不对称性上。在运行RF算法时遇到问题，我在此论坛上找到了一条建议，可使用以下代码删除值太小的变量。

index <- c()
 for (j in (1 : 41))   {
   if (is.numeric(train[ ,j])  &  length(unique(as.numeric(train[ ,j]))) == 1 )
     {index <- append(index,j)}
train <- train[ ,-index]
#ran on test dataset too

但是，我没有看到它删除了火车上的5列和测试中的9列。函数predict（）尝试将由51个变量构建的模型应用于具有47个变量的数据集，该模型返回NA，但没有错误。

Predict（）仅使用随机森林返回NA

1 个答案: