如何根据条件捕获下一个事件

时间:2015-06-10 00:04:09

标签: r events

我有一个庞大的数据集,但我会根据一个例子来解释这个问题。 在下表中,我有Unique ID,我有Date(每个Unique ID的时间序列)和Unique ID的状态。

Unique ID   Date    Status 
        1   Jan-06  Active 
        1   Feb-06  Active 
        1   Mar-06  Not active 
        1   Apr-06  Stable 
        1   May-06  Active 
        1   Jun-06  Stable 
        1   Jul-06  Active 
        1   Aug-06  Active 
        1   Sep-06  Active 
        2   Oct-06  Active 
        2   Nov-06  Not active 
        2   Dec-06  Stable 
        2   Jan-07  Active 
        2   Feb-07  Stable 
        2   Mar-07  Active 
        2   Apr-07  Active 
        2   May-07  Active 

我想要的结果是捕获下一个发生事件的日期(不活跃或稳定)

如果您看到下面的独特ID 1在2006年1月处于活动状态,我们需要捕获它何时达不到活动或稳定,它会在06年3月达到一段时间。

如果您看到下面的独特ID 1在2006年5月处于活动状态,我们需要捕获它何时达不到活动或稳定,它会在06年6月达到一段时间。

注意:无需为已经处于非活动状态或稳定状态的ID添加任何日期

Unique ID   Date    Status  Result 
        1   Jan-06  Active  Mar-06
        1   Feb-06  Active  Apr-06
        1   Mar-06  Not active  NA
        1   Apr-06  Stable  NA 
        1   May-06  Active  Jun-06
        1   Jun-06  Stable  NA 
        1   Jul-06  Active  Always active 
        1   Aug-06  Active  Always active 
        1   Sep-06  Active  Always active 
        2   Oct-06  Active  Nov-06
        2   Nov-06  Not active  NA 
        2   Dec-06  Stable  NA 
        2   Jan-07  Active  Feb-07
        2   Feb-07  Stable  NA
        2   Mar-07  Active  Always active 
        2   Apr-07  Active  Always active 
        2   May-07  Active  Always active 

1 个答案:

答案 0 :(得分:0)

生成示例数据:

> set.seed(1234)
> id <- rep(1:3, times=c(10,10,10))
> date <- seq(as.Date("2000/1/1"), by = "month", length.out = length(id))
> date <- format(date,  "%b-%y")
> status <- sample(c("Active", "Not Active", "Stable"), length(id), replace=TRUE)
> data.frame(id, date, status)
   id   date     status
1   1 Jan-00     Active
2   1 Feb-00 Not Active
3   1 Mar-00 Not Active
4   1 Apr-00 Not Active
5   1 May-00     Stable
6   1 Jun-00 Not Active
7   1 Jul-00     Active
8   1 Aug-00     Active
9   1 Sep-00 Not Active
10  1 Oct-00 Not Active
11  2 Nov-00     Stable
12  2 Dec-00 Not Active
13  2 Jan-01     Active
14  2 Feb-01     Stable
15  2 Mar-01     Active
16  2 Apr-01     Stable
17  2 May-01     Active
18  2 Jun-01     Active
19  2 Jul-01     Active
20  2 Aug-01     Active
21  3 Sep-01     Active
22  3 Oct-01     Active
23  3 Nov-01     Active
24  3 Dec-01     Active
25  3 Jan-02     Active
26  3 Feb-02     Stable
27  3 Mar-02 Not Active
28  3 Apr-02     Stable
29  3 May-02     Stable
30  3 Jun-02     Active

iddate个变量的副本中的每个status替换为“始终有效”的最后一个“有效”观察结果:

> id_last_active <- (id != c(id[-1], FALSE)) & (status == "Active")
> date2 <- as.character(date)
> date2[id_last_active] <- "Always Active"
> status2 <- status
> status2[id_last_active] <- "Always Active"

使用which()获取非“有效”观察的位置(无论id),并使用diff()rep()创建一个位置向量先前“活动”事件组的下一个非“活动”事件。然后获取“活动”事件的下一个非活动事件的日期:

> nonactive <- which(status2 != "Active")
> status_len <- c(nonactive[1], diff(nonactive))
> next_event <- date2[rep(nonactive, times = status_len)]
> next_event[status2 %in% c("Stable", "Not Active")] <- NA

id打印数据(使用by()):

> dat <- data.frame(id, date, status, next_event)
> by(dat, dat$id, function(x) x)
dat$id: 1
   id   date     status next_event
1   1 Jan-00     Active     Feb-00
2   1 Feb-00 Not Active       <NA>
3   1 Mar-00 Not Active       <NA>
4   1 Apr-00 Not Active       <NA>
5   1 May-00     Stable       <NA>
6   1 Jun-00 Not Active       <NA>
7   1 Jul-00     Active     Sep-00
8   1 Aug-00     Active     Sep-00
9   1 Sep-00 Not Active       <NA>
10  1 Oct-00 Not Active       <NA>
--------------------------------------------------------------- 
dat$id: 2
   id   date     status    next_event
11  2 Nov-00     Stable          <NA>
12  2 Dec-00 Not Active          <NA>
13  2 Jan-01     Active        Feb-01
14  2 Feb-01     Stable          <NA>
15  2 Mar-01     Active        Apr-01
16  2 Apr-01     Stable          <NA>
17  2 May-01     Active Always Active
18  2 Jun-01     Active Always Active
19  2 Jul-01     Active Always Active
20  2 Aug-01     Active Always Active
--------------------------------------------------------------- 
dat$id: 3
   id   date     status    next_event
21  3 Sep-01     Active        Feb-02
22  3 Oct-01     Active        Feb-02
23  3 Nov-01     Active        Feb-02
24  3 Dec-01     Active        Feb-02
25  3 Jan-02     Active        Feb-02
26  3 Feb-02     Stable          <NA>
27  3 Mar-02 Not Active          <NA>
28  3 Apr-02     Stable          <NA>
29  3 May-02     Stable          <NA>
30  3 Jun-02     Active Always Active