按变量组汇总列

时间:2020-07-19 19:08:35

标签: r aggregate

我正在使用R,我想根据它们的组对列进行汇总,因此在此示例中,我将其中三个具有 high medium而不是十个列 low 及其汇总值。如果这些是行,我会使用aggregate,但我不知道该如何使用列。

set.seed(4)
a<-matrix(runif(40),ncol=10,nrow=4)
colnames(a)<-letters[1:10]
a
               a         b          c         d         e
[1,] 0.585800305 0.8135742 0.94904022 0.1000535 0.9710557
[2,] 0.008945796 0.2604278 0.07314447 0.9540688 0.5839880
[3,] 0.293739612 0.7244059 0.75467503 0.4156071 0.9622046
[4,] 0.277374958 0.9060922 0.28600062 0.4551024 0.7617024
             f         g         h         i           j
[1,] 0.7145085 0.6491614 0.5137017 0.8779959 0.460025911
[2,] 0.9966129 0.8308064 0.5297775 0.6545220 0.622056487
[3,] 0.5062709 0.4819990 0.5671122 0.4823709 0.388418035
[4,] 0.4899432 0.8417462 0.2389489 0.9710298 0.006592727

type<-c("high","high","low","high","medium","high","medium","high","low","low")

1 个答案:

答案 0 :(得分:0)

我们可以复制type并将其用于tapply

tapply(a, type[col(a)], FUN = sum)
#    high       low    medium 
#10.352068  6.525872  6.082664 

或者是按行排列的

sapply(split(seq_along(type), type), function(i) rowSums(a[, i]))
#         high      low   medium
#[1,] 2.727638 2.287062 1.620217
#[2,] 2.749833 1.349723 1.414794
#[3,] 2.507136 1.625464 1.444204
#[4,] 2.367462 1.263623 1.603449

或更紧凑

sapply(split.default(as.data.frame(a), type), rowSums)

或使用aggregate

aggregate(Freq ~ ., as.data.frame.table(`colnames<-`(a, type)), FUN = sum)

或者使用split将数据拆分为list个向量,并在list上循环以返回sum

sapply(split(a, type[col(a)]), sum)
#    high       low    medium 
#10.352068  6.525872  6.082664