如何在R中的不平衡面板数据中创建自上次事件以来的时间?

时间:2015-10-31 15:13:34

标签: r panel-data

我有不平衡的面板数据,带有二进制变量,指示事件是否发生。我想控制时间依赖性,所以我想创建一个变量,指示自上次事件以来已经过去的年数。数据按二年级组织。

这是一个可重复的例子,带有我想要实现的向量。谢谢!

   id year onset time_since_event
1   1 1989     0                1
2   1 1990     0                2
3   1 1991     1                0
4   1 1992     0                1
5   1 1993     0                2
6   2 1989     0                1
7   2 1990     1                0
8   2 1991     0                1
9   2 1992     1                0
10  3 1991     0                1
11  3 1992     0                2

˚

id <- c(1,1,1,1,1,2,2,2,2,3,3)
year <- c(1989,1990,1991,1992,1993,1989,1990,1991,1992,1991,1992)
onset <- c(0,0,1,0,0,0,1,0,1,0,0)
time_since_event<-c(1,2,0,1,2,1,0,1,0,1,2) #what I want to create
df <- data.frame(cbind(id, year, onset,time_since_event))

1 个答案:

答案 0 :(得分:1)

我们可以使用private static final double MINIMUM_SALARY_FOR_BONUS = 40000; public boolean eligibleForBonus(){ return salary.getSalary() >= MINIMUM_SALARY_FOR_BONUS; } 。转换&#39; data.frame&#39;到&#39; data.table&#39; (data.table,使用setDT(df)根据&#39;起始&#39;列创建一个游程ID分组变量(&#39; ind&#39;)。我们将&#39; time_since_event&#39;列指定为&#39;起始&#39;不等于1的行序列。在下一步中,我们将&#39; time_since_event&#39;列指定为&#39;&#39; id&#39;列。 ,将&#39; NA&#39;元素替换为0。

rleid

或者它可以做得紧凑。由library(data.table)#v1.9.6+ setDT(df)[, ind:=rleid(onset)][onset!=1, time_since_event:=1:.N , by = .(ind, id)][is.na(time_since_event), time_since_event:= 0] df # id year onset ind time_since_event # 1: 1 1989 0 1 1 # 2: 1 1990 0 1 2 # 3: 1 1991 1 2 0 # 4: 1 1992 0 3 1 # 5: 1 1993 0 3 2 # 6: 2 1989 0 3 1 # 7: 2 1990 1 4 0 # 8: 2 1991 0 5 1 # 9: 2 1992 1 6 0 #10: 3 1991 0 7 1 #11: 3 1992 0 7 2 和&#39; id&#39;分组专栏,我们否定了&#39; (这样0变为TRUE且1 FALSE),与行序列(rleid(onset))相乘,并将其分配(1:.N)作为&#39; time_since_event&#39;列。

:=

或者我们可以使用setDT(df)[,time_since_event := 1:.N *!onset, by = .(rleid(onset), id)] df # id year onset time_since_event # 1: 1 1989 0 1 # 2: 1 1990 0 2 # 3: 1 1991 1 0 # 4: 1 1992 0 1 # 5: 1 1993 0 2 # 6: 2 1989 0 1 # 7: 2 1990 1 0 # 8: 2 1991 0 1 # 9: 2 1992 1 0 #10: 3 1991 0 1 #11: 3 1992 0 2 。我们按“ID&id”分组创建另一个变量(通过在&#39;开始&#39;(dplyr)中取相邻元素的差异,创建一个逻辑索引(diff)和!=0索引)。在cumsum中,我们将行序列(mutate)与否定的&#39;起始&#39;相乘。 (就像之前一样),并删除&#39; ind&#39;列使用row_number()

select

数据

library(dplyr)
df %>% 
    group_by(id, ind= cumsum(c(TRUE, diff(onset)!=0))) %>% 
    mutate(time_since_event= (!onset) *row_number()) %>%
    ungroup() %>%
    select(-ind) 
#     id  year onset time_since_event
#   (dbl) (dbl) (dbl)            (int)
#1      1  1989     0                1
#2      1  1990     0                2
#3      1  1991     1                0
#4      1  1992     0                1
#5      1  1993     0                2
#6      2  1989     0                1
#7      2  1990     1                0
#8      2  1991     0                1
#9      2  1992     1                0
#10     3  1991     0                1
#11     3  1992     0                2