Question

所以，我试图从randomForest对象中提取所有树数据，并将其放入数据框中。我一次拔出一棵树，cbind用树的索引，然后一起尝试rbind。这是我的代码。应该很容易重现。

# Do some setup, and train a basic random forest model

library(randomForest)
data(iris)

model <- randomForest(Species ~ ., data=iris)

# Make a data frame containing all the tree data

output <- data.frame()

for (i in 1:model[['forest']][['ntree']]) {
  new_values <- getTree(model, i)
  new_values <- cbind(tree = rep(i, nrow(new_values)), new_values)

  output <- rbind(output, test_new, make.row.names = FALSE)

  # Added for debug purposes...
  new_values
  output
  break
}

因此，当我查看new_values时，在第一步之后，树的值为1.但是当我查看数据框“output”时，树的值为500.如果我让这个循环没有通过调试代码，在整个循环结束时，“tree”对于整个数据集等于500。我希望树显然是一个从1到500的索引。

显然我正在做一些重大错误，或者rbind进程以某种方式改变了我的数据中的值。这是怎么回事？

（我想我可以用do.call和lapply重写，看看是否有任何变化，但我仍然想知道为什么这不能用于学习目的的机制。）

Answer 1

您刚刚在new_values内test_new之间交换了rbind。我更改了它并尝试了下面的代码，可以获得包含所有树数据的数据框，根据树编号：

# Do some setup, and train a basic random forest model

library(randomForest)
data(iris)

model <- randomForest(Species ~ ., data=iris)

# Make a data frame containing all the tree data

output <- data.frame()

for (i in 1:model[['forest']][['ntree']]) {
  new_values <- getTree(model, i)
  new_values <- cbind(tree = rep(i, nrow(new_values)), new_values)

  output <- rbind(output, new_values, make.row.names = FALSE)

}

rbind更改列中的值

1 个答案: