Question

是否有更有效的方法来执行以下操作，即根据分组变量聚合矩阵？

mat <- matrix( sample(2:100, 50), ncol=10, nrow=5)
colnames(mat) <- c(LETTERS[1:10])
rownames(mat) <- 1:5
mat.m <- melt(mat)
mat.m$Group <- NA

df <- cbind( data.frame(ID=LETTERS[1:10]), data.frame(Group=c("Plant","Fish","Rodent","Fish","Rodent","Bird","Plant","Fish","Bird","Bird")))
df$ID <- as.character(df$ID)
df$Group <- as.character(df$Group)

for( i in 1:nrow(mat.m) ){
  for( j in 1:nrow(df) ){
    mat.m$Group[i] <- ifelse(mat.m$Var2[i]==df$ID[j], df$Group[j], mat.m$Group[i])
  }
} 

mat.agg <- dcast(mat.m, Var1~Group, fun.aggregate = sum)

mat.agg
   Bird Fish Plant Rodent
1  154  215    43     83
2  122   44   132    163
3  177  211   118    120
4  206  125    89     92
5  125  269   151    156

我有非常大的矩阵，所以我想知道是否有更有效的方法。

Answer 1

我们可以split＆＃39; ID＆＃39;由＆＃39; Group＆＃39;在＆＃39; df＆＃39;中，在list的{{1}}子集中循环播放＆＃39; mat＆＃39;基于＆＃39; ID＆＃39;，使用vapply获取每行的总和以获得rowSums作为输出。

matrix

注意：vapply(split(df$ID, df$Group), function(x) rowSums(mat[,x]), numeric(nrow(mat)))方法速度很快，正如我们使用split一样，它也提高了效率。

基于分组变量

1 个答案: