我想使用ddply函数将相同的函数写入多个列,但是我尝试将它们写在一行中,希望看到有更好的方法吗?
以下是数据的简单版本:
data<-data.frame(TYPE=as.integer(runif(20,1,3)),A_MEAN_WEIGHT=runif(20,1,100),B_MEAN_WEIGHT=runif(20,1,10))
我想通过这样做找出A_MEAN_WEIGHT和B_MEAN_WEIGHT列的总和:
ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT))
但是在我目前的数据中,我有超过8个“* _MEAN_WEIGHT”,我已经厌倦了8次写作
ddply(data,.(TYPE),summarise,MEAN_A=sum(A_MEAN_WEIGHT),MEAN_B=sum(B_MEAN_WEIGHT),MEAN_C=sum(C_MEAN_WEIGHT),MEAN_D=sum(D_MEAN_WEIGHT),MEAN_E=sum(E_MEAN_WEIGHT),MEAN_F=sum(F_MEAN_WEIGHT),MEAN_G=sum(G_MEAN_WEIGHT),MEAN_H=sum(H_MEAN_WEIGHT))
有没有更好的方法来写这个?谢谢你的帮助!!
答案 0 :(得分:6)
以plyr
为中心的方法是使用colwise
例如
ddply(data, .(TYPE), colwise(sum))
TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1 1 319.8977 60.80317
2 2 621.6745 37.05863
如果您只想要一个子集
,则可以将列名称作为参数.col
传递
您还可以使用numcolwise
或catcolwise
仅对数字或分类列进行操作。
请注意,您可以使用sapply
代替colwise
的最基本用途
ddply(data, .(TYPE), sapply, FUN = 'mean')
惯用的data.table方法是使用lapply(.SD, fun)
例如
dt <- data.table(data)
dt[,lapply(.SD, sum) ,by = TYPE]
TYPE A_MEAN_WEIGHT B_MEAN_WEIGHT
1: 2 621.6745 37.05863
2: 1 319.8977 60.80317
答案 1 :(得分:4)
试试这个:
ddply(data, .(TYPE), colSums)
这是上面的(较慢的)等价物,可以调整以放置任何函数而不是求和:
ddply(data, .(TYPE), function(x) {apply(x, 2, sum)})
如果您想保留.(TYPE)
列,请按照以下方式执行操作:
ddply(data, .(TYPE), function(x) {apply(x[,names(x) != "TYPE"], 2, sum)})
更好的是,使用data.table
代替plyr
:
library(data.table)
dt = data.table(data)
# just sums
dt[, data.table(t(colSums(.SD))), by = TYPE]
# sum for "A" and "B", and sqrt(sum) for "C" and "D"
# note: you will have to call setnames() to fix the column names after
dt[, data.table(t(colSums(.SD[, c("A_MEAN_WEIGHT", "B_MEAN_WEIGHT"), with = F])),
t(apply(.SD[, c("C_MEAN_WEIGHT", "D_MEAN_WEIGHT"), with = F],
2, function(x) sqrt(sum(x))))),
by = TYPE]