假设我有一个r data.frame df
,如下所示:
df = data.frame(matrix(rnorm(24*5), nrow=24, ncol=5))
f1 = rep(c('A', 'B', 'C', 'D'), each=6)
f2 = rep(c('i', 'ii'), times=12)
df$f1 = as.factor(f1)
df$f2 = as.factor(f2)
df
X1 X2 X3 X4 X5 f1 f2
1 0.43199861 0.710242961 1.339928854 -1.241609127 0.222987482 A i
2 1.38058957 -0.084379985 0.007244097 -1.505817169 1.841083186 A ii
3 -0.07266697 0.194356316 -0.566179369 1.178202899 -1.583327136 A i
4 -0.10157803 0.137415112 -0.011487657 -0.324716212 1.161609061 A ii
5 0.98067650 1.824717342 -1.048111998 -0.825228970 -0.968037647 A i
6 0.24261186 -2.116217786 0.027420259 -1.232210879 -1.868444772 A ii
7 -0.73898107 -0.883783872 -0.556182026 -1.662352192 -0.583576555 B i
8 -1.25095555 -0.583574360 0.285764366 1.959217909 0.625261013 B ii
9 -0.30281764 -1.319204327 -0.984133568 -1.219553912 -0.059147710 B i
10 -1.85947863 0.384337575 0.713635785 -1.101081205 -0.378312099 B ii
11 -0.50185467 -0.072254218 0.163350676 -1.718950235 -1.367719178 B i
12 0.48938546 -0.005681783 -0.326662794 1.027273649 -0.490005391 B ii
13 -1.24160913 0.222987482 0.431998610 0.710242961 1.339928854 C i
14 -1.50581717 1.841083186 1.380589565 -0.084379985 0.007244097 C ii
15 1.17820290 -1.583327136 -0.072666966 0.194356316 -0.566179369 C i
16 -0.32471621 1.161609061 -0.101578026 0.137415112 -0.011487657 C ii
17 -0.82522897 -0.968037647 0.980676496 1.824717342 -1.048111998 C i
18 -1.23221088 -1.868444772 0.242611864 -2.116217786 0.027420259 C ii
19 -1.66235219 -0.583576555 -0.738981072 -0.883783872 -0.556182026 D i
20 1.95921791 0.625261013 -1.250955549 -0.583574360 0.285764366 D ii
21 -1.21955391 -0.059147710 -0.302817635 -1.319204327 -0.984133568 D i
22 -1.10108120 -0.378312099 -1.859478634 0.384337575 0.713635785 D ii
23 -1.71895024 -1.367719178 -0.501854665 -0.072254218 0.163350676 D i
24 1.02727365 -0.490005391 0.489385461 -0.005681783 -0.326662794 D ii
根据这两个因素对这些数据进行平均的最佳方法是什么?这里的平均值将产生8行,因为我们在f1中有4个级别,在f2中有2个级别。
我查看了by
和aggregate
。我们的想法是使用公式来指定组。问题是我有很多X变量,所以我不能在公式中全部写出来。