如何在data.table列中查找模式

时间:2017-02-13 16:33:16

标签: r data.table

我有一个data.table,如:

   ID                Time Event
 1: 1 2016-09-25 14:47:52     1
 2: 1 2016-10-03 19:35:04     1
 3: 1 2016-10-03 21:11:00    -1
 4: 1 2016-10-04 14:25:56     1
 5: 1 2016-11-05 01:40:13     1
 6: 1 2016-11-27 04:40:21     1
 7: 1 2016-12-04 02:36:37     1
 8: 1 2017-01-12 13:48:01     1
 9: 1 2017-01-15 03:32:35     1
10: 1 2017-02-05 01:35:07     1
11: 1 2017-02-05 02:29:31     1
12: 1 2017-02-05 02:34:33     1
13: 2 2016-07-15 08:14:11     1
14: 2 2016-07-22 22:15:44     1
15: 2 2016-07-23 12:00:00    -1
16: 2 2016-11-30 18:21:51     1
17: 2 2016-12-03 07:00:31     1
18: 2 2016-12-06 06:30:34     1
19: 2 2016-12-16 10:00:50     1
20: 2 2017-01-16 08:33:16     1

我正在尝试检查在按ID分组的否定事件后是否发生了积极事件。我的理想输出是data.table with:

ID Outcome
1    TRUE
2    TRUE

我不知道如何制定应考虑时间列和事件列的过滤条件:我想知道,对于给定的ID,是否有Event = 1 with Time>事件-1的时间......但我无法在代码中表达这一点......任何人都可以提供帮助吗?

我在这附上一个演示数据集:

fakedata <- structure(list(ID = c(1L, 1L, 1L, 
                   1L, 1L, 1L, 1L, 1L, 
                   1L, 1L, 1L, 1L, 2L, 
                   2L, 2L, 2L, 2L, 2L, 
                   2L, 2L), Time = c("2016-09-25 14:47:52", "2016-10-03 19:35:04", 
                                                       "2016-10-03 21:11:00", "2016-10-04 14:25:56", "2016-11-05 01:40:13", 
                                                       "2016-11-27 04:40:21", "2016-12-04 02:36:37", "2017-01-12 13:48:01", 
                                                       "2017-01-15 03:32:35", "2017-02-05 01:35:07", "2017-02-05 02:29:31", 
                                                       "2017-02-05 02:34:33", "2016-07-15 08:14:11", "2016-07-22 22:15:44", 
                                                       "2016-07-23 12:00:00", "2016-11-30 18:21:51", "2016-12-03 07:00:31", 
                                                       "2016-12-06 06:30:34", "2016-12-16 10:00:50", "2017-01-16 08:33:16"
                   ), Event = c(1, 1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, 1, 
                                1, 1, 1, 1)), .Names = c("ID", "Time", "Event"), class = c("data.table", 
                                                                                            "data.frame"), row.names = c(NA, -20L))

1 个答案:

答案 0 :(得分:1)

以下是使用基本R函数data.tableany以及which运算符的&&方法。

fakedata[order(ID, as.POSIXct(Time)),
         .(outcome=any(Event == -1) && Event[which(Event == -1)+1] > 0), by=ID]
   ID outcome
1:  1    TRUE
2:  2    TRUE

正如评论中提到的david-arenburg,如果确保在计算之前正确地订购数据集是个好主意。对于data.table,我们可以在i参数中执行此操作。根据david-arenburg的评论,我在ID上订购了它,然后在as.POSIXct(Time)上订购。

在j参数中,.(outcome=any(Event==-1) && Event[which(Event == -1)+1] > 0)any(Event == -1)检查是否存在-1,如果是,则Event[which(Event == -1)+1] > 0)检查在每个实例中是否存在-1,紧接着事件的价值是积极的。如果第一个实例失败,则返回FALSE。