我想重新编写以下有些复杂的plyr命令,以便它更快并使用aggregate,tapply或data.table。
此功能允许您输入多个ID变量并测量变量,然后返回多个计算。但是,在更大的数据集上,它可能不是最有效的。
这是代码......
require(ggplot2) # to get the diamonds data set
require(plyr)
mean_sd_for_several_variables <- function(df, lvls, measures) {
res <- ddply(df, lvls, function(x) {
ret <- vector()
for(measure in measures) {
mean_sd <- c(mean(x[,measure]), sd(x[,measure]))
names(mean_sd) <- c(paste0("mean_", measure), paste0("sd_", measure))
ret <- c(ret, mean_sd)
}
return(ret)
}
)
print(res)
}
...返回:
mean_sd_for_several_variables(diamonds, c("color", "cut"), c("price","depth"))
color cut mean_price sd_price mean_depth sd_depth
1 D Fair 4291.1 3286.1 64.048 3.29220
2 D Good 3405.4 3175.1 62.366 2.22240
3 D Very Good 3470.5 3523.8 61.750 1.46223
4 D Premium 3631.3 3711.6 61.169 1.15806
5 D Ideal 2629.1 3001.1 61.678 0.71201
6 E Fair 3682.3 2976.7 63.320 4.42103
7 E Good 3423.6 3330.7 62.204 2.23059
8 E Very Good 3214.7 3408.0 61.730 1.42377
9 E Premium 3538.9 3795.0 61.176 1.16454
10 E Ideal 2597.6 2956.0 61.687 0.70718
11 F Fair 3827.0 3223.3 63.508 3.70209
12 F Good 3495.8 3202.4 62.202 2.23976
13 F Very Good 3778.8 3786.1 61.722 1.38939
14 F Premium 4324.9 4012.0 61.260 1.16775
15 F Ideal 3374.9 3766.6 61.676 0.69398
16 G Fair 4239.3 3609.6 64.340 3.57340
17 G Good 4123.5 3702.5 62.527 2.03893
18 G Very Good 3872.8 3861.4 61.841 1.33169
19 G Premium 4500.7 4356.6 61.279 1.15341
20 G Ideal 3720.7 4006.3 61.700 0.68714
21 H Fair 5135.7 3886.5 64.585 3.14173
22 H Good 4276.3 4020.7 62.500 2.09212
23 H Very Good 4535.4 4185.8 61.968 1.31895
24 H Premium 5216.7 4466.2 61.322 1.15164
25 H Ideal 3889.3 4013.4 61.733 0.72939
26 I Fair 4685.4 3730.3 64.221 3.68771
27 I Good 5078.5 4631.7 62.475 2.17958
28 I Very Good 5255.9 4687.1 61.935 1.32890
29 I Premium 5946.2 5053.7 61.329 1.15338
30 I Ideal 4452.0 4505.2 61.794 0.72334
31 J Fair 4975.7 4050.5 64.357 3.31595
32 J Good 4574.2 3707.8 62.396 2.12091
33 J Very Good 5103.5 4135.7 61.902 1.33679
34 J Premium 6294.6 4788.9 61.390 1.13989
35 J Ideal 4918.2 4476.2 61.822 0.94669
答案 0 :(得分:4)
以下是data.table
解决方案
mean_and_sd <- function(.SD){
x1 = lapply(.SD, mean)
x2 = lapply(.SD, sd)
cbind(x1, x2)
}
library(data.table)
DT = data.table(diamonds)
DT[, mean_and_sd(.SD), by = c("cut", "color"), .SDcols = c("price", "carat")]
您可以将其放入一个接受所需输入并返回相应数据框的函数中。
答案 1 :(得分:4)
使用aggregate
> result <- aggregate(cbind(price, depth) ~ color+cut,
FUN=function(x) c(mean=mean(x), sd=sd(x)),
data=diamonds)
> do.call(data.frame, result)
color cut price.mean price.sd depth.mean depth.sd
1 D Fair 4291.061 3286.114 64.0484663 3.2921972
2 E Fair 3682.312 2976.652 63.3196429 4.4210329
3 F Fair 3827.003 3223.303 63.5080128 3.7020938
4 G Fair 4239.255 3609.644 64.3398089 3.5733985
5 H Fair 5135.683 3886.482 64.5851485 3.1417311