与data.table聚合时保持零计数组合

时间:2013-05-13 10:05:11

标签: r data.table

假设我有以下data.table

dt <- data.table(id = c(rep(1, 5), rep(2, 4)),
                 sex = c(rep("H", 5), rep("F", 4)), 
                 fruit = c("apple", "tomato", "apple", "apple", "orange", "apple", "apple", "tomato", "tomato"),
                 key = "id")

   id sex  fruit
1:  1   H  apple
2:  1   H tomato
3:  1   H  apple
4:  1   H  apple
5:  1   H orange
6:  2   F  apple
7:  2   F  apple
8:  2   F tomato
9:  2   F tomato

每一行代表某人(由idsex标识)吃fruit的事实。我想计算fruit每次sex被吃掉的次数。我可以用:

dt[ , .N, by = c("fruit", "sex")]

给出了:

    fruit sex N
1:  apple   H 3
2: tomato   H 1
3: orange   H 1
4:  apple   F 2
5: tomato   F 2

问题是,通过这样做,我失去了orange sex == "F"的计数,因为这个计数为0.有没有办法在不丢失零组合的情况下进行此聚合计数?

非常清楚,所需的结果如下:

   fruit sex N
1:  apple   H 3
2: tomato   H 1
3: orange   H 1
4:  apple   F 2
5: tomato   F 2
6: orange   F 0

非常感谢!

2 个答案:

答案 0 :(得分:9)

似乎最简单的方法是在data.table中明确提供所有类别组合,传递给i=,设置by=.EACHI以迭代它们:

setkey(dt, sex, fruit)
dt[CJ(sex, fruit, unique = TRUE), .N, by = .EACHI]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1

答案 1 :(得分:7)

一种方法是将sexid更改为因子(id在这里是多余的?)

dt[, sex := factor(sex)]
dt[, .(sex=levels(sex), N=c(table(sex))), by=fruit]
#     fruit sex N
# 1:  apple   F 2
# 2:  apple   H 3
# 3: tomato   F 2
# 4: tomato   H 1
# 5: orange   F 0
# 6: orange   H 1

或者您可以将fruit更改为sex的因子和分组:

dt[, fruit := factor(fruit)]
dt[, .(fruit = levels(fruit), N=c(table(fruit))),by=sex]
#    sex  fruit N
# 1:   H  apple 3
# 2:   H orange 1
# 3:   H tomato 1
# 4:   F  apple 2
# 5:   F orange 0
# 6:   F tomato 2

编辑:

但我怀疑如果你的data.table很大,那么取决于table可能不是一个好主意。在这种情况下,using CJ from your earlier question可能是要走的路。也就是说,首先进行聚合,然后进行连接。

out <- setkey(dt, sex, fruit)[, .N, 
             by="sex,fruit"][CJ(c("H","F"), 
             c("apple","tomato","orange")), 
             allow.cartesian=TRUE][is.na(N), N := 0L]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1