R:使用data.table将矩阵中的多个列与另一个列聚合

时间:2018-12-28 11:24:28

标签: r data.table aggregate

我有一个如下的矩阵,其中有一个experiment列。我想尽可能使用data.table将每个列的值汇总到每个实验的中位数。

这将是我的矩阵:

mymat <- cbind(matrix(rnorm(200, 2, 1), ncol=10), rep(c(1,2,3,4), each=5))
colnames(mymat) <- c(letters[1:10], "experiment")
> mymat
                a         b         c        d          e          f         g
 [1,]  1.85290636 1.4655244 1.5928758 2.806357  2.3877045  0.7652088 1.5453970
 [2,]  0.59623693 2.9358288 0.3844413 4.078820  3.3095328  1.1690837 1.2415376
 [3,]  1.16248750 2.5888841 0.9297899 1.533448  2.8787113  1.6461947 1.8478815
 [4,]  2.03978005 1.5790537 2.5109517 1.795046  0.7573681  2.7292048 3.5413826
 [5,]  2.18725618 2.1067400 1.6796388 1.575302 -0.7442371  2.3377271 3.3095631
 [6,]  0.52950277 3.1111915 2.4163010 2.523962  2.5112347  2.9664337 0.2421029
 [7,]  3.66119205 3.4821901 3.1755414 3.062482  3.4375954  1.3338193 3.6183306
 [8,]  2.28161722 1.9757365 1.7151048 2.292672  2.1081080  0.1798869 2.2526790
 [9,]  0.90552795 2.0257843 2.0816727 3.594058  2.6703535  1.2868582 2.5727511
[10,]  3.02884367 2.7644316 3.4798124 2.093063  0.8848031 -0.8261005 3.9786736
[11,]  1.97837003 2.2279987 1.9420593 2.502632  4.1310802  4.0349174 2.4540287
[12,]  1.88415883 0.8000948 2.1097440 2.146443  1.8244366  2.7456270 0.6833915
[13,]  3.65748686 2.0496098 2.6516943 1.830966  2.1950348  2.5920219 0.4546199
[14,]  2.08011351 1.9388831 2.5694895 3.554265  1.3218541  1.4456804 0.7243542
[15,]  2.58591994 3.4888353 1.3391290 3.276568  2.9322798  2.7518610 2.8188685
[16,]  2.76149212 3.1832648 2.5463351 1.199424  1.6953413  2.2278953 0.8489631
[17,]  2.34506440 1.6902070 2.1772089 1.825538  1.1662359  1.5721568 1.2976330
[18,]  2.70533427 1.7654916 2.6679859 1.774898  3.4739425  1.1332421 3.7996391
[19,] -0.03714119 0.6990952 1.3906477 1.517317  1.4790870  1.9142362 2.7310054
[20,]  2.34356130 2.4769488 1.6125056 1.031128  2.6468456  0.8609739 2.3967517
              h           i          j experiment
 [1,] 1.4357545  0.78871544  1.3165557          1
 [2,] 1.9623772  0.16997675  2.4836771          1
 [3,] 4.0266953  2.19620210  1.6624591          1
 [4,] 2.5706419  1.10367611  1.5068350          1
 [5,] 0.9501494  2.42311167  2.4161852          1
 [6,] 3.6892954 -0.08657205  1.5484951          2
 [7,] 2.8480423  2.20989942  2.8492618          2
 [8,] 2.8882966  3.29415999  2.4653390          2
 [9,] 2.1939305  2.94620493  0.7651023          2
[10,] 2.0632747  2.07978600 -0.9511181          2
[11,] 1.4849639  2.10980781  2.3802551          3
[12,] 1.7493999  2.76751329  0.1213996          3
[13,] 2.8929216  0.59990881  3.1995611          3
[14,] 1.1864610  2.00998033  2.3789583          3
[15,] 0.7608926 -0.51588405  1.0342682          3
[16,] 1.7695505  2.82560378  2.9199983          4
[17,] 2.6722732  3.45027838  1.9244870          4
[18,] 2.5383051  1.11827228  1.7554269          4
[19,] 1.4848365  2.78852751  2.5178287          4
[20,] 2.1704674  0.78512549  1.0848203          4

我知道如何以我想要的方式聚合它,指定aaaaall列,例如:

ddd <- data.table(mymat)
mymat2 <- as.data.frame(ddd[, list(a=median(a), b=median(b), c=median(c)), by=list(experiment=experiment)])
> mymat2
  experiment        a        b         c
1          1 2.138530 1.964590 1.4884908
2          2 0.503533 2.065438 1.1209868
3          3 1.968952 1.362444 2.2133432
4          4 3.215352 1.498616 0.8150356

但是在现实生活中,如果要汇总数百个列,情况又如何呢?无需指定所有名称并保留列名称,是否可以执行上述操作?

请注意,如果dim(mymat)为20x11,则dim(mymat2)应该为4x11

谢谢!

0 个答案:

没有答案