聚合多个依赖度量

时间:2015-08-17 18:16:36

标签: r aggregate

我需要聚合R中的一些依赖度量(DM)。我发现以下讨论非常有用:

Aggregate / summarize multiple variables per group (i.e. sum, mean, etc)

基于此,下面的代码基本上可以满足我的需要。然而,随着DM数量的增加(我有很多DM),它会变得非常冗长:

aggregate(cbind(DM1, DM2, DV3, DM4, DM5 ... DMn) ~ F1 + F2 +
           F3, data = sst2, mean, na.rm=TRUE) 

因此我想知道是否有更有效的编写DM的方法,而不必单独键入每一个。大多数感兴趣的DM彼此相邻(即DM3DM4DM5等等,所以我在考虑使用类似cbind(DM1, DM3:DM10, DM14)的内容,但是这似乎不起作用。我还尝试生成相关列名列表。不幸的是,这也不起作用:

pr<-colnames(sst2)
pr2<-pr[pr!="DM2" & pr!="DM11" & pr!="DM12" & pr!="DM13"]
pr3<-noquote(paste(pr2,collapse=","))
pp<-aggregate(cbind(pr3) ~ F1 + F2 +
           F3, data = sst2, mean, na.rm=TRUE) 

关于如何在聚合函数(或其他相关函数,如ddply)中有效地包含大量DM的任何建议都将非常受欢迎。

2 个答案:

答案 0 :(得分:1)

我相信这应该有用

sst2 <- data.frame(F1=c("A","A","B","B","C","C"),
                   F2=c("A","A","A","B","B","B"),
                   F3=c("D","D","D","D","D","D"),
                   DM1=c(5,6,21,61,2,3),
                   DM2=c(1,5,3,6,1,6),
                   DM3=c(1,7,9,1,4,44))

n = 3 # number of DM columns
m = 2 # number of F columns

DM <- paste0("DM", 1:n)

attach(sst2)

# use sapply(DM,get) but this produces separate columns
tmp <- aggregate(sapply(DM, get) ~ F1 + F2, 
                 data = sst2, mean, na.rm=TRUE)

detach(sst2)

# combine these separate columns. The apply is to each row of tmp
data.frame(F1 = tmp$F1, F2 = tmp$F2,
    DM = apply(tmp[(m+1):(n+length(DM)-1)], 1, mean))

#   F1 F2        DM
# 1  A  A  4.166667
# 2  B  A 11.000000
# 3  B  B 22.666667
# 4  C  B 10.000000

编辑

如果您的变量名称不同于唯一需要更改的行

DM <- c("mean.go.RT", "mean.SRT", "mean.SSD", "SSRT")

如果这些变量在您的数据框中,您可以轻松地使用

获取它们
DM <- names(sst2)[4:6]

或您想要的任何其他列(即代替4-6)

答案 1 :(得分:0)

使用select,ddply和numcolwise的替代解决方案:

library(dplyr)
library(plyr)

sst21 <- data.frame(F1=c("A","A","B","B","C","C"),
                   F2=c("A","A","A","B","B","B"),
                   F3=c("D","D","D","D","D","D"),
                   DM1=c(5,6,21,61,2,3),
                   DM2=c(1,5,3,6,1,6),
                   DM3=c(1,7,9,1,4,44),
                   DM4=c(2,3,6,7,2,33),
                   DM5=c(44,55,66,77,55,88))

sel1 <- dplyr::select(sst21, starts_with("F"), .data$DM1 : .data$DM3, .data$DM5) # select columns of interest
sel1 <- dplyr::select(sst21, -c(.data$DM4)) # Alternative: specifying columns to be excluded

sst22 <- plyr::ddply(sel1, .(F1, F2, F3), plyr::numcolwise(mean, na.rm = TRUE)) # Aggregate selected data