我有不平衡的面板数据,带有二进制变量,指示事件是否发生。我想控制时间依赖性,所以我想创建一个变量,指示自上次事件以来已经过去的年数。数据按二年级组织。
这是一个可重复的例子,带有我想要实现的向量。谢谢!
id year onset time_since_event
1 1 1989 0 1
2 1 1990 0 2
3 1 1991 1 0
4 1 1992 0 1
5 1 1993 0 2
6 2 1989 0 1
7 2 1990 1 0
8 2 1991 0 1
9 2 1992 1 0
10 3 1991 0 1
11 3 1992 0 2
˚
id <- c(1,1,1,1,1,2,2,2,2,3,3)
year <- c(1989,1990,1991,1992,1993,1989,1990,1991,1992,1991,1992)
onset <- c(0,0,1,0,0,0,1,0,1,0,0)
time_since_event<-c(1,2,0,1,2,1,0,1,0,1,2) #what I want to create
df <- data.frame(cbind(id, year, onset,time_since_event))
答案 0 :(得分:1)
我们可以使用private static final double MINIMUM_SALARY_FOR_BONUS = 40000;
public boolean eligibleForBonus(){
return salary.getSalary() >= MINIMUM_SALARY_FOR_BONUS;
}
。转换&#39; data.frame&#39;到&#39; data.table&#39; (data.table
,使用setDT(df)
根据&#39;起始&#39;列创建一个游程ID分组变量(&#39; ind&#39;)。我们将&#39; time_since_event&#39;列指定为&#39;起始&#39;不等于1的行序列。在下一步中,我们将&#39; time_since_event&#39;列指定为&#39;&#39; id&#39;列。 ,将&#39; NA&#39;元素替换为0。
rleid
或者它可以做得紧凑。由library(data.table)#v1.9.6+
setDT(df)[, ind:=rleid(onset)][onset!=1, time_since_event:=1:.N ,
by = .(ind, id)][is.na(time_since_event), time_since_event:= 0]
df
# id year onset ind time_since_event
# 1: 1 1989 0 1 1
# 2: 1 1990 0 1 2
# 3: 1 1991 1 2 0
# 4: 1 1992 0 3 1
# 5: 1 1993 0 3 2
# 6: 2 1989 0 3 1
# 7: 2 1990 1 4 0
# 8: 2 1991 0 5 1
# 9: 2 1992 1 6 0
#10: 3 1991 0 7 1
#11: 3 1992 0 7 2
和&#39; id&#39;分组专栏,我们否定了&#39; (这样0变为TRUE且1 FALSE),与行序列(rleid(onset)
)相乘,并将其分配(1:.N
)作为&#39; time_since_event&#39;列。
:=
或者我们可以使用setDT(df)[,time_since_event := 1:.N *!onset, by = .(rleid(onset), id)]
df
# id year onset time_since_event
# 1: 1 1989 0 1
# 2: 1 1990 0 2
# 3: 1 1991 1 0
# 4: 1 1992 0 1
# 5: 1 1993 0 2
# 6: 2 1989 0 1
# 7: 2 1990 1 0
# 8: 2 1991 0 1
# 9: 2 1992 1 0
#10: 3 1991 0 1
#11: 3 1992 0 2
。我们按“ID&id”分组创建另一个变量(通过在&#39;开始&#39;(dplyr
)中取相邻元素的差异,创建一个逻辑索引(diff
)和!=0
索引)。在cumsum
中,我们将行序列(mutate
)与否定的&#39;起始&#39;相乘。 (就像之前一样),并删除&#39; ind&#39;列使用row_number()
。
select
library(dplyr)
df %>%
group_by(id, ind= cumsum(c(TRUE, diff(onset)!=0))) %>%
mutate(time_since_event= (!onset) *row_number()) %>%
ungroup() %>%
select(-ind)
# id year onset time_since_event
# (dbl) (dbl) (dbl) (int)
#1 1 1989 0 1
#2 1 1990 0 2
#3 1 1991 1 0
#4 1 1992 0 1
#5 1 1993 0 2
#6 2 1989 0 1
#7 2 1990 1 0
#8 2 1991 0 1
#9 2 1992 1 0
#10 3 1991 0 1
#11 3 1992 0 2