通过循环在数据表中使用multiple by

时间:2013-12-21 04:49:08

标签: r data.table

假设我有一个数据表,其中包含三列X1,X2,X3,其他列V1,V2,...,VN,以及FUN类型mean, min, max

dt<-data.table(X1,X2,X3,V1,V2,...,VN)

我想在循环中运行它:

dt[,Y112_mean:=mean(X1), by=list(V1,V2)]
dt[,Y113_mean:=mean(X1), by=list(V1,V3)]
...
dt[,Y11N_mean:=mean(X1), by=list(V1,VN)]
...
dt[,Yijk_mean:=mean(Xi), by=list(Vj,Vk)]
...
dt[,Yijk_max:=max(Xi), by=list(Vj,Vk)]
...
dt[,Yijk_min:=min(Xi), by=list(Vj,Vk)]

我尝试按如下方式执行此操作:

for (i in 1:3) {
   for (j in 1:(N-1)) {
      for (k in (j+1):N) {
        for (FUN in c(mean,max,min)) {
   ...
   # get `mean(X1)` or `max(X2)` etc.
   e<-as.name(paste0(substitute(FUN,"(X",i,")"))

   # get `list(V1,V2)` or `list(V2,V3)` etc.
   f<-as.name(paste0("list(V",j,",",V",k,")"))

   # get `Y123_mean' etc.
   g<-as.name(paste0("Y",i,j,k,"_",substitute(FUN)))

   # get the column now (this doesn't work below).
   # e.g. of error `list(V1,V2)` not found.
   dt[,eval(g):=eval(e),by=eval(f)]
   ...
   }
  }
 }
}

显然,我对evaldata.table的申请可能有误。我在data.table文档中注意到有一个.BY,我尝试了几种组合,但也无法使用它。

我尝试的另一种选择是

dt[,(paste0("Y",i,j,k,"_",substitute(FUN)):=FUN(dt[[paste0("X",i]]),by=eval(f)]

但我在eval(f)部分收到错误,例如找不到list(V1,V2)

我怀疑我可能犯了很多错误。什么是正确的语法?

感谢。

修改

以下是 minimal 可重现的示例:

假设VN是V4

X1<-seq(1,1000)
X2<-seq(1,1000)
X3<-seq(1,1000)
V1<-rep(seq(1,10),100)
V2<-rep(seq(1,5),200)
V3<-rep(seq(1,4),250)
V4<-rep(seq(1,2),500)

2 个答案:

答案 0 :(得分:0)

尝试使用类似术语:

foo = data.frame(Species=c(rep("A",4),"B",rep("C",3),"D","D"), 
             Effect=c(rep("Reproduction",3), rep("Growth",2),
                      "Reproduction", rep("Mortality",2), rep("Growth",2)), 
             Concentration=c(1.2,1.4,1.3,1.5,1.6,1.2,1.1,1,1.3,1.4))

使用package plyr:

library(plyr)
ddply(foo, .(Species,Effect), function(x) mean(x[,"Concentration"]))  

您也可以尝试:

 datDT <- data.table(foo, key="Species,Effect")
 datDT[, list(Concentration = mean(Concentration)), by = key(datDT)] 

Sqldf解决方案:

library(sqldf)
sqldf("select Species, Effect,
  avg(Concentration) `Concentration`
  from foo
  group by Species, Effect")

答案 1 :(得分:0)

我修改了您的代码以创建单个命令字符串,该字符串使用evalparse进行评估。请注意,FUN不代表函数,而是代表它的名称。

for (i in 1:3) {
  for (j in 1:(N-1)) {
    for (k in (j+1):N) {
      for (FUN in c("mean","max","min")) {
        ...
        # get `mean(X1)` or `max(X2)` etc.
        e <- paste0(FUN,"(X",i,")")

        # get `list(V1,V2)` or `list(V2,V3)` etc.
        f <- paste0("list(V",j,",","V",k,")")

        # get `Y123_mean' etc.
        g <- paste0("Y",i,j,k,"_",FUN)

        # create the whole command
        command <- paste0("dt[,",g,":=",e,",by=",f,"]")

        # run command
        eval(parse(text = command))

        ...
      }
    }
  }
}