plyr包在多列上编写相同的函数

时间:2013-04-18 18:30:12

标签: r plyr

我想使用ddply函数将相同的函数写入多个列,但是我尝试将它们写在一行中,希望看到有更好的方法吗?

以下是数据的简单版本:

data<-data.frame(TYPE=as.integer(runif(20,1,3)),A_MEAN_WEIGHT=runif(20,1,100),B_MEAN_WEIGHT=runif(20,1,10))

我想通过这样做找出A_MEAN_WEIGHT和B_MEAN_WEIGHT列的总和:

ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT))

但是在我目前的数据中,我有超过8个“* _MEAN_WEIGHT”,我已经厌倦了8次写作

ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT),MEAN_C=sum(C_MEAN_WEIGHT),MEAN_D=sum(D_MEAN_WEIGHT),MEAN_E=sum(E_MEAN_WEIGHT),MEAN_F=sum(F_MEAN_WEIGHT),MEAN_G=sum(G_MEAN_WEIGHT),MEAN_H=sum(H_MEAN_WEIGHT))

有没有更好的方法来写这个?谢谢你的帮助!!

2 个答案:

答案 0 :(得分:6)

plyr为中心的方法是使用colwise

例如

 ddply(data, .(TYPE), colwise(sum))
  TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1    1      319.8977      60.80317
2    2      621.6745      37.05863

如果您只想要一个子集

,则可以将列名称作为参数.col传递

您还可以使用numcolwisecatcolwise仅对数字或分类列进行操作。

请注意,您可以使用sapply代替colwise的最基本用途

ddply(data, .(TYPE), sapply, FUN = 'mean') 

惯用的data.table方法是使用lapply(.SD, fun)

例如

dt <- data.table(data)
dt[,lapply(.SD, sum) ,by = TYPE]
   TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1:    2      621.6745      37.05863
2:    1      319.8977      60.80317

答案 1 :(得分:4)

试试这个:

ddply(data, .(TYPE), colSums)

这是上面的(较慢的)等价物,可以调整以放置任何函数而不是求和:

ddply(data, .(TYPE), function(x) {apply(x, 2, sum)})

如果您想保留.(TYPE)列,请按照以下方式执行操作:

ddply(data, .(TYPE), function(x) {apply(x[,names(x) != "TYPE"], 2, sum)})

更好的是,使用data.table代替plyr

library(data.table)
dt = data.table(data)

# just sums
dt[, data.table(t(colSums(.SD))), by = TYPE]

# sum for "A" and "B", and sqrt(sum) for "C" and "D"
# note: you will have to call setnames() to fix the column names after
dt[, data.table(t(colSums(.SD[, c("A_MEAN_WEIGHT", "B_MEAN_WEIGHT"), with = F])),
                t(apply(.SD[, c("C_MEAN_WEIGHT", "D_MEAN_WEIGHT"), with = F],
                        2, function(x) sqrt(sum(x))))),
     by = TYPE]