考虑以下数据集:
set.seed(144)
d=data.frame(x=round(runif(30)),y=sample(LETTERS[1:3],30,TRUE),z=sample(LETTERS[1:3],30,TRUE))
但是
aggregate(x~y+z,d,table)
给我这样的话:
head(aggregate(x~y+z,d,table))
y z x
1 A A 3, 1
2 B A 2, 2
3 C A 2, 1
4 A B 2, 2
5 B B 1, 3
6 C B 2
7 A C 2, 2
8 B C 2, 1
9 C C 1, 1
虽然x列中我需要的是计数的比率:
y z x
1 A A 3.0000000
2 B A 1.0000000
3 C A 2.0000000
4 A B 1.0000000
5 B B 0.3333333
6 C B 0.0000000
7 A C 1.0000000
8 B C 2.0000000
9 C C 1.0000000
答案 0 :(得分:2)
您可以将用户定义的函数传递给aggregate
来计算您的比率:
# Setting seed to make a reproducible example
set.seed(144)
d=data.frame(x=round(runif(30)),y=sample(LETTERS[1:3],30,TRUE),z=sample(LETTERS[1:3],30,TRUE))
head(aggregate(x~y+z, d, function(x) sum(x == 0) / sum(x == 1)))
# y z x
# 1 A A 3.0000000
# 2 B A 1.0000000
# 3 C A 2.0000000
# 4 A B 1.0000000
# 5 B B 0.3333333
# 6 C B 0.0000000
答案 1 :(得分:2)
table
时, 0
没有值y=='C' & z=='B'
的输出,因此为该行返回NA
可能是合理的。如果是这样的话:
aggregate(x~y+z, d, function(x) {
tb <- table(x)
tb['0']/tb['1']
}
)
y z x
1 A A 3.0000000
2 B A 1.0000000
3 C A 2.0000000
4 A B 1.0000000
5 B B 0.3333333
6 C B NA
7 A C 1.0000000
8 B C 2.0000000
9 C C 1.0000000
答案 2 :(得分:0)
尝试:
> d=data.frame(x=round(runif(30)),y=sample(LETTERS[1:3],30,TRUE),z=sample(LETTERS[1:3],30,TRUE))
> d
x y z
1 0 B C
2 1 A C
3 0 C C
4 0 C A
5 0 C B
6 0 B C
7 1 A A
8 1 B C
9 1 B A
10 1 C C
11 1 A A
12 0 B C
13 0 B B
14 0 A A
15 1 C B
16 1 C A
17 1 B C
18 1 C C
19 1 C A
20 0 B A
21 0 B A
22 0 A C
23 1 C A
24 0 C A
25 1 C B
26 0 C C
27 1 C A
28 1 B A
29 1 C B
30 1 B A
> aa = aggregate(x~y+z,d,table)
> aa
y z x
1 A A 1, 2
2 B A 2, 3
3 C A 2, 4
4 B B 1
5 C B 1, 3
6 A C 1, 1
7 B C 3, 2
8 C C 2, 2
> bb =data.frame(aa$x)
> cc = bb[,seq(2,length(bb),2)]
> dd = cc[2,]/cc[1,]
> aa$out = t(dd)
> aa
y z x 2
1 A A 1, 2 2.0000000
2 B A 2, 3 1.5000000
3 C A 2, 4 2.0000000
4 B B 1 1.0000000
5 C B 1, 3 3.0000000
6 A C 1, 1 1.0000000
7 B C 3, 2 0.6666667
8 C C 2, 2 1.0000000
>