我想在下面的例子中计算每个数值变量的均值。这些需要按照与" id"相关联的每个因素进行分组。以及与" status"相关联的每个因素。
set.seed(10)
dfex <-
data.frame(id=c("2","1","1","1","3","2","3"),status=c("hit","miss","miss","hit","miss","miss","miss"),var3=rnorm(7),var4=rnorm(7),var5=rnorm(7),var6=rnorm(7))
对于&#34; id&#34;组,第一行输出将被标记为&#34; mean-id-1&#34;。行标有&#34; mean-id-2&#34;和&#34; mean-id-3&#34;会跟随。对于&#34;状态&#34;这些行将被标记为&#34; mean-status-miss&#34;和&#34;卑鄙的状态命中&#34;。我的目标是以编程方式生成这些方法及其行标签。
我尝试了许多不同的应用函数排列,但每个都有问题。我也尝试过聚合功能。
答案 0 :(得分:1)
执行此操作的最快方法可能是使用data.table
(对于大数据集),虽然我没有找到在data.table
对象中显示新行名称的方法,因此我将其转换回data.frame
library(data.table)
setDT(dfex) # convert `dfex` to a `data.table` object
#setkey(dfex, id) # This is not necessary, only if you want to sort your table by "id" column first
dat1 <- as.data.frame(dfex[,-2, with = F][, lapply(.SD, mean), by = id])
rownames(dat1) <- paste0("mean-id-", as.character(dat1[,"id"]))
dat2 <- as.data.frame(dfex[,-1, with = F][, lapply(.SD, mean), by = status])
rownames(dat2) <- paste0("mean-status-", as.character(dat2[,"status"]))
答案 1 :(得分:0)
对于基础R,以下适用于“id”列:
means_id <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$id),mean)
rownames(means_id) <- paste0("mean-id-",means_id$Group.1)
means_id$Group.1 <- NULL
输出:
var3 var4 var5 var6
mean-id-1 -0.7182503 -0.2604572 -0.3535823 -1.3530417
mean-id-2 0.2042702 -0.3009548 0.6121843 -1.4364211
mean-id-3 -0.4567655 0.8716131 0.1646053 -0.6229102
“状态”列的内容相同:
means_status <- aggregate(dfex[,grep("var",names(dfex))],list(dfex$status),mean)
rownames(means_status) <- paste0("mean-status-",means_status$Group.1)
means_status$Group.1 <- NULL
答案 2 :(得分:0)
你可以这样做:
do.call(rbind,by(dfex[,-(1:2)], paste("mean-id",dfex[,1],sep="-"), colMeans))
var3 var4 var5 var6
mean-id-1 -0.7383944 0.5005763 -0.4777325 0.6988741
mean-id-2 -0.0316267 -0.1764453 0.1313834 0.6867287
mean-id-3 0.7489377 0.8091953 0.9290247 -0.1263163
将两个结果创建为列表:
lapply(c("id","status"), function(x) do.call(rbind,by(dfex[grep("var",names(dfex))], paste("mean-id",dfex[,x],sep="-"), colMeans)))
更新
library(matrixStats)
lapply(c("id","status"), function(x) do.call(rbind,by(dfex[grep("var",names(dfex))], paste("mean-id",dfex[,x],sep="-"), colSds)))
[[1]]
var3 var4 var5 var6
mean-id-1 0.6024318 1.36423044 0.5398717 0.7260939
mean-id-2 0.2623706 0.08870122 0.1827246 1.0590560
mean-id-3 1.0625137 0.16381062 1.0760977 0.3524908
[[2]]
var3 var4 var5 var6
mean-id-hit 0.4369311 1.036234 0.6622341 0.6506010
mean-id-miss 0.8288436 1.035163 0.7688912 0.6799636