我有一个如下的矩阵,其中有一个experiment
列。我想尽可能使用data.table
将每个列的值汇总到每个实验的中位数。
这将是我的矩阵:
mymat <- cbind(matrix(rnorm(200, 2, 1), ncol=10), rep(c(1,2,3,4), each=5))
colnames(mymat) <- c(letters[1:10], "experiment")
> mymat
a b c d e f g
[1,] 1.85290636 1.4655244 1.5928758 2.806357 2.3877045 0.7652088 1.5453970
[2,] 0.59623693 2.9358288 0.3844413 4.078820 3.3095328 1.1690837 1.2415376
[3,] 1.16248750 2.5888841 0.9297899 1.533448 2.8787113 1.6461947 1.8478815
[4,] 2.03978005 1.5790537 2.5109517 1.795046 0.7573681 2.7292048 3.5413826
[5,] 2.18725618 2.1067400 1.6796388 1.575302 -0.7442371 2.3377271 3.3095631
[6,] 0.52950277 3.1111915 2.4163010 2.523962 2.5112347 2.9664337 0.2421029
[7,] 3.66119205 3.4821901 3.1755414 3.062482 3.4375954 1.3338193 3.6183306
[8,] 2.28161722 1.9757365 1.7151048 2.292672 2.1081080 0.1798869 2.2526790
[9,] 0.90552795 2.0257843 2.0816727 3.594058 2.6703535 1.2868582 2.5727511
[10,] 3.02884367 2.7644316 3.4798124 2.093063 0.8848031 -0.8261005 3.9786736
[11,] 1.97837003 2.2279987 1.9420593 2.502632 4.1310802 4.0349174 2.4540287
[12,] 1.88415883 0.8000948 2.1097440 2.146443 1.8244366 2.7456270 0.6833915
[13,] 3.65748686 2.0496098 2.6516943 1.830966 2.1950348 2.5920219 0.4546199
[14,] 2.08011351 1.9388831 2.5694895 3.554265 1.3218541 1.4456804 0.7243542
[15,] 2.58591994 3.4888353 1.3391290 3.276568 2.9322798 2.7518610 2.8188685
[16,] 2.76149212 3.1832648 2.5463351 1.199424 1.6953413 2.2278953 0.8489631
[17,] 2.34506440 1.6902070 2.1772089 1.825538 1.1662359 1.5721568 1.2976330
[18,] 2.70533427 1.7654916 2.6679859 1.774898 3.4739425 1.1332421 3.7996391
[19,] -0.03714119 0.6990952 1.3906477 1.517317 1.4790870 1.9142362 2.7310054
[20,] 2.34356130 2.4769488 1.6125056 1.031128 2.6468456 0.8609739 2.3967517
h i j experiment
[1,] 1.4357545 0.78871544 1.3165557 1
[2,] 1.9623772 0.16997675 2.4836771 1
[3,] 4.0266953 2.19620210 1.6624591 1
[4,] 2.5706419 1.10367611 1.5068350 1
[5,] 0.9501494 2.42311167 2.4161852 1
[6,] 3.6892954 -0.08657205 1.5484951 2
[7,] 2.8480423 2.20989942 2.8492618 2
[8,] 2.8882966 3.29415999 2.4653390 2
[9,] 2.1939305 2.94620493 0.7651023 2
[10,] 2.0632747 2.07978600 -0.9511181 2
[11,] 1.4849639 2.10980781 2.3802551 3
[12,] 1.7493999 2.76751329 0.1213996 3
[13,] 2.8929216 0.59990881 3.1995611 3
[14,] 1.1864610 2.00998033 2.3789583 3
[15,] 0.7608926 -0.51588405 1.0342682 3
[16,] 1.7695505 2.82560378 2.9199983 4
[17,] 2.6722732 3.45027838 1.9244870 4
[18,] 2.5383051 1.11827228 1.7554269 4
[19,] 1.4848365 2.78852751 2.5178287 4
[20,] 2.1704674 0.78512549 1.0848203 4
我知道如何以我想要的方式聚合它,指定aaaaall列,例如:
ddd <- data.table(mymat)
mymat2 <- as.data.frame(ddd[, list(a=median(a), b=median(b), c=median(c)), by=list(experiment=experiment)])
> mymat2
experiment a b c
1 1 2.138530 1.964590 1.4884908
2 2 0.503533 2.065438 1.1209868
3 3 1.968952 1.362444 2.2133432
4 4 3.215352 1.498616 0.8150356
但是在现实生活中,如果要汇总数百个列,情况又如何呢?无需指定所有名称并保留列名称,是否可以执行上述操作?
请注意,如果dim(mymat)
为20x11,则dim(mymat2)
应该为4x11
谢谢!