R data.table复杂过滤,分组,分配

时间:2016-06-22 21:33:36

标签: r data.table grouping

library(data.table)
dt <- data.table(structure(list(helpfulDescriptor = structure(c(1L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("ugly_carpet","zestyTomato", "brexit-Vote"), class = "factor"), eventDate = structure(c(15162,15162, 15249, 15249, 15249, 15249, 15250, 15250, 15250, 15250,16868, 16883, 16883, 16883, 16883, 16883, 15414, 15414, 15414,15418, 15418, 16588, 16591, 16591, 15372, 15601, 15601, 16230,16423, 16577, 16577, 16827), class = "Date"), indicator = c(0L,0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("helpfulDescriptor","eventDate", "indicator"), class = c("data.table",     "data.frame")))

dt

    helpfulDescriptor  eventDate indicator
 1:       ugly_carpet 2011-07-07         0
 2:       ugly_carpet 2011-07-07         0
 3:       ugly_carpet 2011-10-02         0
 4:       ugly_carpet 2011-10-02         0
 5:       ugly_carpet 2011-10-02         0
 6:       ugly_carpet 2011-10-02         0
 7:       ugly_carpet 2011-10-03         0
 8:       ugly_carpet 2011-10-03         0
 9:       ugly_carpet 2011-10-03         0
10:       ugly_carpet 2011-10-03         0
11:       ugly_carpet 2016-03-08         0
12:       ugly_carpet 2016-03-23         0
13:       ugly_carpet 2016-03-23         0
14:       ugly_carpet 2016-03-23         0
15:       ugly_carpet 2016-03-23         0
16:       ugly_carpet 2016-03-23         0
17:       zestyTomato 2012-03-15         0
18:       zestyTomato 2012-03-15         0
19:       zestyTomato 2012-03-15         0
20:       zestyTomato 2012-03-19         0
21:       zestyTomato 2012-03-19         0
22:       zestyTomato 2015-06-02         0
23:       zestyTomato 2015-06-05         0
24:       zestyTomato 2015-06-05         0
25:       brexit-Vote 2012-02-02         0
26:       brexit-Vote 2012-09-18         0
27:       brexit-Vote 2012-09-18         0
28:       brexit-Vote 2014-06-09         0
29:       brexit-Vote 2014-12-19         0
30:       brexit-Vote 2015-05-22         0
31:       brexit-Vote 2015-05-22         0
32:       brexit-Vote 2016-01-27         0

我正在努力使用data.table来识别共享由helpfulDescriptor分组的相同,最近,(最大)日期的所有行。

dt[eventDate == max(eventDate), indicator := 1L, by = c('helpfulDescriptor')]

这仅将指标设置为1表示整个data.table的最大日期,而不是分组...

我的其他尝试使用tail,setkey(),. SD,ect ......失败。

正确的解决方案是将指标设置为1,最后(在此排列列表中)最近(日期)5行ugly_carpet,2 2015-06-05 zestyTomato,以及1个最终或最近的brexitvotes。 / p>

谢谢。

0 个答案:

没有答案