这里使用的数据集是来自曲解包,MASS的基因型。
> names(genotype)
[1] "Litter" "Mother" "Wt"
> str(genotype)
'data.frame': 61 obs. of 3 variables:
$ Litter: Factor w/ 4 levels "A","B","I","J": 1 1 1 1 1 1 1 1 1 1 ...
$ Mother: Factor w/ 4 levels "A","B","I","J": 1 1 1 1 1 2 2 2 3 3 ...
$ Wt : num 61.5 68.2 64 65 59.7 55 42 60.2 52.5 61.8 ...
这是教程中给出的问题: 练习6.7。在基因型()数据中找出每个母亲所生的最重的老鼠。
tapply,由分子因子基因型分开$母亲给出:
> tapply(genotype$Wt, genotype$Mother, max)
A B I J
68.2 69.8 61.8 61.0
此外:
> out <- tapply(genotype$Wt, genotype[,1:2],max)
> out
Mother
Litter A B I J
A 68.2 60.2 61.8 61.0
B 60.3 64.7 59.0 51.3
I 68.0 69.8 61.3 54.5
J 59.0 59.5 61.4 54.0
首先tapply给出每个母亲最重的老鼠,然后第二个(out)给出一张桌子,让我可以确定每个母亲哪种类型的母猪最重。是否有另一种方法可以匹配每个母亲最重的Litter,例如,如果2 dim表真的很大。
答案 0 :(得分:3)
我们可以使用data.table
。我们将'data.frame'转换为'data.table'(setDT(genotype)
)。使用which.max
创建索引,并对按“母亲”分组的数据集的行进行子集化。
library(data.table)#v1.9.5+
setDT(genotype)[, .SD[which.max(Wt)], by = Mother]
# Mother Litter Wt
#1: A A 68.2
#2: B I 69.8
#3: I A 61.8
#4: J A 61.0
如果我们只对'母亲'的'{1}}'Wt'感兴趣
max
根据OP显示的最后setDT(genotype)[, list(Wt=max(Wt)), by = Mother]
# Mother Wt
#1: A 68.2
#2: B 69.8
#3: I 61.8
#4: J 61.0
代码,如果我们需要类似的输出,我们可以使用deve版本的'data.table'中的tapply
dcast
dcast(setDT(genotype), Litter ~ Mother, value.var='Wt', max)
# Litter A B I J
#1: A 68.2 60.2 61.8 61.0
#2: B 60.3 64.7 59.0 51.3
#3: I 68.0 69.8 61.3 54.5
#4: J 59.0 59.5 61.4 54.0
答案 1 :(得分:1)
来自统计数据:
aggregate(. ~ Mother, data = genotype, max)
或
aggregate(Wt ~ Mother, data = genotype, max)