如何根据R中的因子创建子组的摘要

时间:2014-06-13 05:57:45

标签: r group-summaries

我想在下面的例子中计算每个数值变量的均值。这些需要按照与" id"相关联的每个因素进行分组。以及与" status"相关联的每个因素。

set.seed(10)
dfex <- 
data.frame(id=c("2","1","1","1","3","2","3"),status=c("hit","miss","miss","hit","miss","miss","miss"),var3=rnorm(7),var4=rnorm(7),var5=rnorm(7),var6=rnorm(7))

对于&#34; id&#34;组,第一行输出将被标记为&#34; mean-id-1&#34;。行标有&#34; mean-id-2&#34;和&#34; mean-id-3&#34;会跟随。对于&#34;状态&#34;这些行将被标记为&#34; mean-status-miss&#34;和&#34;卑鄙的状态命中&#34;。我的目标是以编程方式生成这些方法及其行标签。

我尝试了许多不同的应用函数排列,但每个都有问题。我也尝试过聚合功能。

3 个答案:

答案 0 :(得分:1)

执行此操作的最快方法可能是使用data.table(对于大数据集),虽然我没有找到在data.table对象中显示新行名称的方法,因此我将其转换回data.frame

library(data.table)
setDT(dfex) # convert `dfex` to a `data.table` object
#setkey(dfex, id) # This is not necessary, only if you want to sort your table by "id" column first
dat1 <- as.data.frame(dfex[,-2, with = F][, lapply(.SD, mean), by = id])
rownames(dat1) <- paste0("mean-id-", as.character(dat1[,"id"]))
dat2 <- as.data.frame(dfex[,-1, with = F][, lapply(.SD, mean), by = status])
rownames(dat2) <- paste0("mean-status-", as.character(dat2[,"status"]))

答案 1 :(得分:0)

对于基础R,以下适用于“id”列:

means_id <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$id),mean)
rownames(means_id) <- paste0("mean-id-",means_id$Group.1)
means_id$Group.1 <- NULL

输出:

                var3       var4       var5       var6
mean-id-1 -0.7182503 -0.2604572 -0.3535823 -1.3530417
mean-id-2  0.2042702 -0.3009548  0.6121843 -1.4364211
mean-id-3 -0.4567655  0.8716131  0.1646053 -0.6229102

“状态”列的内容相同:

means_status <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$status),mean)
rownames(means_status) <- paste0("mean-status-",means_status$Group.1)
means_status$Group.1 <- NULL

答案 2 :(得分:0)

你可以这样做:

do.call(rbind,by(dfex[,-(1:2)], paste("mean-id",dfex[,1],sep="-"), colMeans)) 
              var3       var4       var5       var6
mean-id-1 -0.7383944  0.5005763 -0.4777325  0.6988741
mean-id-2 -0.0316267 -0.1764453  0.1313834  0.6867287
mean-id-3  0.7489377  0.8091953  0.9290247 -0.1263163

将两个结果创建为列表:

 lapply(c("id","status"), function(x) do.call(rbind,by(dfex[grep("var",names(dfex))], paste("mean-id",dfex[,x],sep="-"), colMeans)))

更新

library(matrixStats)
 lapply(c("id","status"), function(x) do.call(rbind,by(dfex[grep("var",names(dfex))], paste("mean-id",dfex[,x],sep="-"), colSds)))
 [[1]]
              var3       var4      var5      var6
 mean-id-1 0.6024318 1.36423044 0.5398717 0.7260939
 mean-id-2 0.2623706 0.08870122 0.1827246 1.0590560
 mean-id-3 1.0625137 0.16381062 1.0760977 0.3524908

[[2]]
                  var3     var4      var5      var6
mean-id-hit  0.4369311 1.036234 0.6622341 0.6506010
mean-id-miss 0.8288436 1.035163 0.7688912 0.6799636