我需要聚合R中的一些依赖度量(DM)。我发现以下讨论非常有用:
Aggregate / summarize multiple variables per group (i.e. sum, mean, etc)
基于此,下面的代码基本上可以满足我的需要。然而,随着DM数量的增加(我有很多DM),它会变得非常冗长:
aggregate(cbind(DM1, DM2, DV3, DM4, DM5 ... DMn) ~ F1 + F2 +
F3, data = sst2, mean, na.rm=TRUE)
因此我想知道是否有更有效的编写DM的方法,而不必单独键入每一个。大多数感兴趣的DM彼此相邻(即DM3
,DM4
,DM5
等等,所以我在考虑使用类似cbind(DM1, DM3:DM10, DM14)
的内容,但是这似乎不起作用。我还尝试生成相关列名列表。不幸的是,这也不起作用:
pr<-colnames(sst2)
pr2<-pr[pr!="DM2" & pr!="DM11" & pr!="DM12" & pr!="DM13"]
pr3<-noquote(paste(pr2,collapse=","))
pp<-aggregate(cbind(pr3) ~ F1 + F2 +
F3, data = sst2, mean, na.rm=TRUE)
关于如何在聚合函数(或其他相关函数,如ddply)中有效地包含大量DM的任何建议都将非常受欢迎。
答案 0 :(得分:1)
我相信这应该有用
sst2 <- data.frame(F1=c("A","A","B","B","C","C"),
F2=c("A","A","A","B","B","B"),
F3=c("D","D","D","D","D","D"),
DM1=c(5,6,21,61,2,3),
DM2=c(1,5,3,6,1,6),
DM3=c(1,7,9,1,4,44))
n = 3 # number of DM columns
m = 2 # number of F columns
DM <- paste0("DM", 1:n)
attach(sst2)
# use sapply(DM,get) but this produces separate columns
tmp <- aggregate(sapply(DM, get) ~ F1 + F2,
data = sst2, mean, na.rm=TRUE)
detach(sst2)
# combine these separate columns. The apply is to each row of tmp
data.frame(F1 = tmp$F1, F2 = tmp$F2,
DM = apply(tmp[(m+1):(n+length(DM)-1)], 1, mean))
# F1 F2 DM
# 1 A A 4.166667
# 2 B A 11.000000
# 3 B B 22.666667
# 4 C B 10.000000
编辑
如果您的变量名称不同于唯一需要更改的行
DM <- c("mean.go.RT", "mean.SRT", "mean.SSD", "SSRT")
如果这些变量在您的数据框中,您可以轻松地使用
获取它们DM <- names(sst2)[4:6]
或您想要的任何其他列(即代替4-6)
答案 1 :(得分:0)
使用select,ddply和numcolwise的替代解决方案:
library(dplyr)
library(plyr)
sst21 <- data.frame(F1=c("A","A","B","B","C","C"),
F2=c("A","A","A","B","B","B"),
F3=c("D","D","D","D","D","D"),
DM1=c(5,6,21,61,2,3),
DM2=c(1,5,3,6,1,6),
DM3=c(1,7,9,1,4,44),
DM4=c(2,3,6,7,2,33),
DM5=c(44,55,66,77,55,88))
sel1 <- dplyr::select(sst21, starts_with("F"), .data$DM1 : .data$DM3, .data$DM5) # select columns of interest
sel1 <- dplyr::select(sst21, -c(.data$DM4)) # Alternative: specifying columns to be excluded
sst22 <- plyr::ddply(sel1, .(F1, F2, F3), plyr::numcolwise(mean, na.rm = TRUE)) # Aggregate selected data