Question

感谢Hadley的plyr包ddply函数，我们可以获取一个数据帧，按因子将其分解为子数据帧，将每个数据发送到一个函数，然后将每个子数据帧的函数结果合并到一个新的数据帧中。

但是如果函数返回像glm这样的类的对象，或者在我的情况下，返回一个c（“glm”，“lm”）。那么，这些不能组合成数据帧吗？我得到了这个错误

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class 'c("glm", "lm")' into a data.frame

是否有一些更灵活的数据结构可以容纳我的函数调用的所有复杂的glm类结果，保留有关数据框子集的信息？

或者这应该以完全不同的方式完成？

Answer 1

只是为了扩展我的评论：plyr具有一组用于组合输入和输出类型的函数。因此，当您的函数返回data.frame不可转换的内容时，您应该使用list作为输出。因此，不要使用ddply使用dlply。

如果您想在每个模型上执行某些操作并将结果转换为data.frame，那么ldply就是关键。

让我们使用dlply

创建一些模型

list_of_models <- dlply(warpbreaks, .(tension), function(X) lm(breaks~wool, data=X))
str(list_of_models, 1)
# List of 3
#  $ L:List of 13
#   ..- attr(*, "class")= chr "lm"
#  $ M:List of 13
#   ..- attr(*, "class")= chr "lm"
#  $ H:List of 13
#   ..- attr(*, "class")= chr "lm"
#  - attr(*, "split_type")= chr "data.frame"
#  - attr(*, "split_labels")='data.frame':        3 obs. of  1 variable:

它提供了list个lm个模型。

使用ldply您可以创建data.frame，例如

预测每个模型：

ldply(list_of_models, function(model) {
    data.frame(fit=predict(model, warpbreaks))
})
#     tension     fit
# 1         L 44.5556
# 2         L 44.5556
# 3         L 44.5556

包含每个模型的统计信息：

ldply(list_of_models, function(model) {
  c(
    aic = extractAIC(model),
    deviance = deviance(model),
    logLik = logLik(model),
    confint = confint(model),
    coef = coef(model)
  )
})
# tension aic1    aic2 deviance   logLik confint1  confint2 confint3 confint4 coef.(Intercept) coef.woolB
# 1       L    2 98.3291  3397.78 -72.7054  34.2580 -30.89623  54.8531 -1.77044          44.5556  -16.33333
# 2       M    2 81.1948  1311.56 -64.1383  17.6022  -4.27003  30.3978 13.82559          24.0000    4.77778
# 3       H    2 76.9457  1035.78 -62.0137  18.8701 -13.81829  30.2411  2.26273          24.5556   -5.77778

通过因子值将数据帧分成子集，发送到返回glm类的函数，如何重组？

1 个答案: