Question

使用此数据集（InsectSprays）。

> d <- InsectSprays
> str(d)
'data.frame':   72 obs. of  3 variables:
 $ count: num  10 7 20 14 14 12 10 23 17 20 ...
 $ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ total: num  174 174 174 174 174 174 174 174 174 174 ...

对于每个因素（即喷雾类型），如何在每次观察中添加总计数？即我们想要

> head(d)
  count spray total
1    10     A   174
2     7     A   174
3    20     A   174
4    14     A   174
5    14     A   174
6    12     A   174

在某些讲座中，建议使用ddply：

> head(ddply(d, .(spray), summarize, sum=ave(count, FUN=sum)))
  spray sum
1     A 174
2     A 174
3     A 174
4     A 174
5     A 174
6     A 174

此命令是否具有仅仅使用ave本身的特定优势？

> d$total <- ave(d$count, d$spray, FUN=sum)
> head(d)
  count spray total
1    10     A   174
2     7     A   174
3    20     A   174
4    14     A   174
5    14     A   174
6    12     A   174

我不是说ddply没有任何价值，但在这个特殊的例子中，我觉得我没有看到使用它的重点。

这里有ddply应用程序的特定优势吗？

Answer 1

我不知道......

> library("microbenchmark")
> microbenchmark(ddply(d, .(spray), summarize, sum=ave(count, FUN=sum)), d$total <- ave(d$count, d$spray, FUN=sum))
Unit: microseconds
                                                       expr      min        lq    median        uq       max neval
 ddply(d, .(spray), summarize, sum = ave(count, FUN = sum)) 4262.996 4418.8750 4504.3195 4620.7480 10167.530   100
                d$total <- ave(d$count, d$spray, FUN = sum)  222.080  232.2795  249.2145  267.8815   620.822   100

`ddply`对`stats :: ave`的特殊优势？

1 个答案: