Question

我有不平衡的面板数据，带有二进制变量，指示事件是否发生。我想控制时间依赖性，所以我想创建一个变量，指示自上次事件以来已经过去的年数。数据按二年级组织。

这是一个可重复的例子，带有我想要实现的向量。谢谢！

   id year onset time_since_event
1   1 1989     0                1
2   1 1990     0                2
3   1 1991     1                0
4   1 1992     0                1
5   1 1993     0                2
6   2 1989     0                1
7   2 1990     1                0
8   2 1991     0                1
9   2 1992     1                0
10  3 1991     0                1
11  3 1992     0                2

˚

id <- c(1,1,1,1,1,2,2,2,2,3,3)
year <- c(1989,1990,1991,1992,1993,1989,1990,1991,1992,1991,1992)
onset <- c(0,0,1,0,0,0,1,0,1,0,0)
time_since_event<-c(1,2,0,1,2,1,0,1,0,1,2) #what I want to create
df <- data.frame(cbind(id, year, onset,time_since_event))

Answer 1

我们可以使用private static final double MINIMUM_SALARY_FOR_BONUS = 40000; public boolean eligibleForBonus(){ return salary.getSalary() >= MINIMUM_SALARY_FOR_BONUS; }。转换＆＃39; data.frame＆＃39;到＆＃39; data.table＆＃39; （data.table，使用setDT(df)根据＆＃39;起始＆＃39;列创建一个游程ID分组变量（＆＃39; ind＆＃39;）。我们将＆＃39; time_since_event＆＃39;列指定为＆＃39;起始＆＃39;不等于1的行序列。在下一步中，我们将＆＃39; time_since_event＆＃39;列指定为＆＃39;＆＃39; id＆＃39;列。，将＆＃39; NA＆＃39;元素替换为0。

rleid

或者它可以做得紧凑。由library(data.table)#v1.9.6+ setDT(df)[, ind:=rleid(onset)][onset!=1, time_since_event:=1:.N , by = .(ind, id)][is.na(time_since_event), time_since_event:= 0] df # id year onset ind time_since_event # 1: 1 1989 0 1 1 # 2: 1 1990 0 1 2 # 3: 1 1991 1 2 0 # 4: 1 1992 0 3 1 # 5: 1 1993 0 3 2 # 6: 2 1989 0 3 1 # 7: 2 1990 1 4 0 # 8: 2 1991 0 5 1 # 9: 2 1992 1 6 0 #10: 3 1991 0 7 1 #11: 3 1992 0 7 2和＆＃39; id＆＃39;分组专栏，我们否定了＆＃39; （这样0变为TRUE且1 FALSE），与行序列（rleid(onset)）相乘，并将其分配（1:.N）作为＆＃39; time_since_event＆＃39;列。

:=

或者我们可以使用setDT(df)[,time_since_event := 1:.N *!onset, by = .(rleid(onset), id)] df # id year onset time_since_event # 1: 1 1989 0 1 # 2: 1 1990 0 2 # 3: 1 1991 1 0 # 4: 1 1992 0 1 # 5: 1 1993 0 2 # 6: 2 1989 0 1 # 7: 2 1990 1 0 # 8: 2 1991 0 1 # 9: 2 1992 1 0 #10: 3 1991 0 1 #11: 3 1992 0 2。我们按“ID＆id”分组创建另一个变量（通过在＆＃39;开始＆＃39;（dplyr）中取相邻元素的差异，创建一个逻辑索引（diff）和!=0索引）。在cumsum中，我们将行序列（mutate）与否定的＆＃39;起始＆＃39;相乘。（就像之前一样），并删除＆＃39; ind＆＃39;列使用row_number()。

select

数据

library(dplyr)
df %>% 
    group_by(id, ind= cumsum(c(TRUE, diff(onset)!=0))) %>% 
    mutate(time_since_event= (!onset) *row_number()) %>%
    ungroup() %>%
    select(-ind) 
#     id  year onset time_since_event
#   (dbl) (dbl) (dbl)            (int)
#1      1  1989     0                1
#2      1  1990     0                2
#3      1  1991     1                0
#4      1  1992     0                1
#5      1  1993     0                2
#6      2  1989     0                1
#7      2  1990     1                0
#8      2  1991     0                1
#9      2  1992     1                0
#10     3  1991     0                1
#11     3  1992     0                2

如何在R中的不平衡面板数据中创建自上次事件以来的时间？

1 个答案:

数据