使用聚合来划分计数

时间:2014-07-30 00:02:01

标签: r

考虑以下数据集:

set.seed(144)
d=data.frame(x=round(runif(30)),y=sample(LETTERS[1:3],30,TRUE),z=sample(LETTERS[1:3],30,TRUE))

但是

aggregate(x~y+z,d,table)

给我这样的话:

head(aggregate(x~y+z,d,table))
  y z    x
1 A A 3, 1
2 B A 2, 2
3 C A 2, 1
4 A B 2, 2
5 B B 1, 3
6 C B    2
7 A C 2, 2
8 B C 2, 1
9 C C 1, 1

虽然x列中我需要的是计数的比率:

  y z         x
1 A A 3.0000000
2 B A 1.0000000
3 C A 2.0000000
4 A B 1.0000000
5 B B 0.3333333
6 C B 0.0000000
7 A C 1.0000000
8 B C 2.0000000
9 C C 1.0000000

3 个答案:

答案 0 :(得分:2)

您可以将用户定义的函数传递给aggregate来计算您的比率:

# Setting seed to make a reproducible example
set.seed(144)
d=data.frame(x=round(runif(30)),y=sample(LETTERS[1:3],30,TRUE),z=sample(LETTERS[1:3],30,TRUE))
head(aggregate(x~y+z, d, function(x) sum(x == 0) / sum(x == 1)))
#   y z         x
# 1 A A 3.0000000
# 2 B A 1.0000000
# 3 C A 2.0000000
# 4 A B 1.0000000
# 5 B B 0.3333333
# 6 C B 0.0000000

答案 1 :(得分:2)

table时,

0没有值y=='C' & z=='B'的输出,因此为该行返回NA可能是合理的。如果是这样的话:

aggregate(x~y+z, d, function(x) {
                                   tb <- table(x)
                                   tb['0']/tb['1']
                                }
)
  y z         x
1 A A 3.0000000
2 B A 1.0000000
3 C A 2.0000000
4 A B 1.0000000
5 B B 0.3333333
6 C B        NA
7 A C 1.0000000
8 B C 2.0000000
9 C C 1.0000000

答案 2 :(得分:0)

尝试:

> d=data.frame(x=round(runif(30)),y=sample(LETTERS[1:3],30,TRUE),z=sample(LETTERS[1:3],30,TRUE))
> d
   x y z
1  0 B C
2  1 A C
3  0 C C
4  0 C A
5  0 C B
6  0 B C
7  1 A A
8  1 B C
9  1 B A
10 1 C C
11 1 A A
12 0 B C
13 0 B B
14 0 A A
15 1 C B
16 1 C A
17 1 B C
18 1 C C
19 1 C A
20 0 B A
21 0 B A
22 0 A C
23 1 C A
24 0 C A
25 1 C B
26 0 C C
27 1 C A
28 1 B A
29 1 C B
30 1 B A
> aa = aggregate(x~y+z,d,table)
> aa
  y z    x
1 A A 1, 2
2 B A 2, 3
3 C A 2, 4
4 B B    1
5 C B 1, 3
6 A C 1, 1
7 B C 3, 2
8 C C 2, 2
> bb =data.frame(aa$x)
> cc = bb[,seq(2,length(bb),2)]
> dd = cc[2,]/cc[1,]
> aa$out = t(dd)
> aa
  y z    x         2
1 A A 1, 2 2.0000000
2 B A 2, 3 1.5000000
3 C A 2, 4 2.0000000
4 B B    1 1.0000000
5 C B 1, 3 3.0000000
6 A C 1, 1 1.0000000
7 B C 3, 2 0.6666667
8 C C 2, 2 1.0000000
>