在R数据表中,您如何进行子集和计数?

时间:2017-10-31 11:44:40

标签: r datatable aggregation

我有一组与此类似的数据

a = data.table(
  ID = c(1, 1, 2, 2, 2, 3, 3),
  TOUR = c("USA", "CHINA", "CHINA", "CHINA", "EUROPE", "CANADA", "USA")
)

我希望聚合数据来创建它:

What I'm aiming for

使用数据表.....有人能告诉我怎么做?

谢谢,菲尔,

1 个答案:

答案 0 :(得分:3)

我们可以先创建“NUMBER_OF_BOOKINGS”,按“ID”分组.N,即行数,然后dcast fun.aggregate as长度

dcast(a[, NUMBER_OF_BOOKINGS := .N, ID], ID + NUMBER_OF_BOOKINGS ~ TOUR, length)
#    ID NUMBER_OF_BOOKINGS CANADA CHINA EUROPE USA
#1:  1                  2      0     1      0   1
#2:  2                  3      0     2      1   0
#3:  3                  2      1     0      0   1

如果我们需要前缀"TOUR",请使用paste

dcast(a[, NUMBER_OF_BOOKINGS := .N, ID], ID + NUMBER_OF_BOOKINGS ~ 
                    paste0("TOUR_", TOUR), length)

上面的方法还会在我们分配(:=)时在原始数据集中创建一个列。如果我们想避免这种情况,我们可以进行联接

a[, .(NUMBER_OF_BOOKINGS = .N), ID][dcast(a, ID ~ paste0("TOUR_", TOUR), length), on = .(ID)]
#   ID NUMBER_OF_BOOKINGS TOUR_CANADA TOUR_CHINA TOUR_EUROPE TOUR_USA
#1:  1                  2           0          1           0        1
#2:  2                  3           0          2           1        0
#3:  3                  2           1          0           0        1