如何使用R计算组数?

时间:2015-12-03 23:40:48

标签: r data.table

这可能是一个非常简单的问题,我有一个带密钥和超过1000行的data.table,其中两行可以设置为密钥。我想计算这个数据集的组数。

例如,简单数据是(ID和Act是关键)

ID  ValueDate Act Volume
1 2015-01-01 EUR     21
1 2015-02-01 EUR     22
1 2015-01-01 MAD     12
1 2015-02-01 MAD     11
2 2015-01-01 EUR      5
2 2015-02-01 EUR      7
3 2015-01-01 EUR      4
3 2015-02-01 EUR      2
3 2015-03-01 EUR      6

以下是生成测试数据的代码:

dd <- data.table(ID = c(1,1,1,1,2,2,3,3,3), 
                 ValueDate = c("2015-01-01", "2015-02-01", "2015-01-    01","2015-02-01", "2015-01-01","2015-02-01","2015-01-01","2015-02-01","2015-03-01"),
                 Act = c("EUR","EUR","MAD","MAD","EUR","EUR","EUR","EUR","EUR"),
                 Volume=c(21,22,12,11,5,7,4,2,6))

在这种情况下,我们可以看到总共有 4 子集。

我尝试将此表的密钥设置为第一个

setkey(dd, ID, Act)

然后我认为 count 的功能可能会计算群体的数量。 使用 count 的功能是对的,还是可以有一个简单的方法?

非常感谢!

2 个答案:

答案 0 :(得分:3)

nrow(dd[, .(cnt= sum(.N)), by= c("ID", "Act")])

# or using base R
{t <- table(interaction(dd$ID, dd$Act)); length(t[t>0])}

# or for the counts:
dd[, .(cnt= sum(.N)), by= c("ID", "Act")]
   ID Act cnt
1:  1 EUR   2
2:  1 MAD   2
3:  2 EUR   2
4:  3 EUR   3

答案 1 :(得分:3)

最快的方式应该是uniqueN

library(data.table)
dd <- data.table(ID = c(1,1,1,1,2,2,3,3,3), 
                 ValueDate = c("2015-01-01", "2015-02-01", "2015-01-01","2015-02-01", "2015-01-01","2015-02-01","2015-01-01","2015-02-01","2015-03-01"),
                 Act = c("EUR","EUR","MAD","MAD","EUR","EUR","EUR","EUR","EUR"),
                 Volume=c(21,22,12,11,5,7,4,2,6))
uniqueN(dd, by = c("ID", "Act"))
#[1] 4