R将多个randomForest对象放入向量中

时间:2011-10-19 02:38:02

标签: r list random-forest

我很好奇R是否有能力将对象放入向量/列表/数组/等。我使用randomforest包来处理更大的数据的子集,并希望将每个版本存储在列表中。它类似于:

answers <- c()
for(i in 1:10){
x <- round((1/i), 3)
answers <- (rbind(answers, x))
}

理想情况下,我想做这样的事情:

answers <- c()
for(i in 1:10){
RF <- randomForest(training, training$data1, sampsize=c(100), do.trace=TRUE, importance=TRUE, ntree=50,,forest=TRUE)
answers <- (rbind(answers, RF))
}

这种方法有效,但这里是单个RF对象的输出:

> RF 

Call:
 randomForest(x = training, y = training$data1, ntree = 50, sampsize = c(100), importance = TRUE, do.trace = TRUE,      forest = TRUE) 
               Type of random forest: regression
                     Number of trees: 10
No. of variables tried at each split: 2

          Mean of squared residuals: 0.05343956
                    % Var explained: 14.32

虽然这是'答案'列表的输出:

> answers 
   call       type         predicted      mse        rsq        oob.times      importance importanceSD
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
RF Expression "regression" Numeric,150000 Numeric,10 Numeric,10 Integer,150000 Numeric,16 Numeric,8   
   localImportance proximity ntree mtry forest  coefs y              test inbag
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 
RF NULL            NULL      10    2    List,11 NULL  Integer,150000 NULL NULL 

有谁知道如何存储所有RF对象或调用它们以便存储的信息与单个RF对象相同?谢谢你的建议。

4 个答案:

答案 0 :(得分:9)

不要一次生长矢量或列出一个元素。预分配它们并将对象分配给特定部分:

answers <- vector("list",10)
for (i in 1:10){
    answers[[i]] <- randomForest(training, training$data1, sampsize=c(100), 
                                 do.trace=TRUE, importance=TRUE, ntree=50,
                                 forest=TRUE)
}

作为旁注,rbind向量不会创建另一个向量或列表;如果你在第一个例子中检查输出,你会发现它是一个有一列的矩阵。这解释了在尝试rbind randomForest对象时所观察到的奇怪行为。

答案 1 :(得分:5)

使用lapply

lapply(1:10,function(i) randomForest(<your parameters>))

您将获得随机森林对象的列表;然后,您可以使用[[]]运算符访问其中的第i个。

答案 2 :(得分:3)

使用以下命令初始化列表:

mylist <- vector("list")  # technically all objects in R are vectors

添加到:

new_element <- 5
mylist <- c(mylist, new_element)

@joran关于预分配的建议在列表很大时是相关的,但是当它们很小时并不是完全必要的。您还可以访问在原始代码中构建的矩阵。它看起来有点奇怪,但信息都在那里。例如,该列表矩阵的第一个元素可以通过以下方式恢复:

answers[1, ]

答案 3 :(得分:0)

Other answers provide solutions to store random forest objects in a list, but they don't explain why they are working.

As @42- hints, this is not the pre-allocation step that solves the issue here.

The real problem is that a randomForest object is fundamentally a list (check is.list(randomForest(...)). When you write a statement such as:

list_of_rf = c()                                       # ... or list_of_rf = NULL
list_of_rf = rbind(list_of_rf, randomForest(...))      # ... or list_of_rf = c(list_of_rf, randomForest(...))

you are essentially asking to concatenate an empty object with a list. Instead of resulting in a list of length 1 (the random forest model), this statement results in a list containing all the random forest model components! You can verify this by typing in you R console:

> length(list_of_rf)

[1] 19

There are several ways to force R to perform the operation that you want:

  1. explicit affectation in the list (cf @joran answer, although there is no need to pre-allocate):

    list_of_rf = NULL
    list_of_rf[[1]] = randomForest(...)
    
  2. let lapply (or similar) build the list (cf @mbq answer):

    list_of_rf = lapply(..., function(i) randomForest(...))
    
  3. encapsulate the random forest within a list, which will be simplified during the concatenation:

    list_of_rf = NULL
    list_of_rf = c(list_of_rf, list(randomForest(...)))
    

Finally, if you made a mistake and unlisted your randomForest model which took 10 hours to be computed, don't sweat, you can still restore it as follows:

list_of_rf = NULL
list_of_rf = c(list_of_rf, randomForest(...)) # oups, mistake
rf = as.vector(list_of_rf)[1:19]
class(rf) = 'randomForest'