如何填写'期限'列之间有一个' start'并且'结束'指标如下例所示?
在Stata中,它将是:
by id (year), sort: gen duration=1 if start==1
by id (year), sort: replace duration=1 if duration[_n-1]==1 & end!=1
我怎么能在R中这样做,可能使用Dplyr?
id year start end
1 2000 0 0
1 2001 1 0
1 2002 0 0
1 2003 0 1
1 2004 0 0
2 2000 0 0
2 2001 0 0
2 2002 1 0
2 2003 0 0
2 2004 0 1
输出将是:
id year start end duration
1 2000 0 0 0
1 2001 1 0 1
1 2002 0 0 1
1 2003 0 1 0
1 2004 0 0 0
2 2000 0 0 0
2 2001 0 0 0
2 2002 1 0 1
2 2003 0 0 1
2 2004 0 1 0
答案 0 :(得分:4)
使用dplyr
,这似乎可以解决问题。首先,样本数据
dd<-read.table(text="id year start end
1 2000 0 0
1 2001 1 0
1 2002 0 0
1 2003 0 1
1 2004 0 0
2 2000 0 0
2 2001 0 0
2 2002 1 0
2 2003 0 0
2 2004 0 1", header=T)
现在我们只按ID进行分组,然后我们使用cumsum
来查找开始和结束时的更改
library(dplyr)
dd %>% group_by(id) %>% mutate(duration = cumsum(start-end))
# id year start end duration
# (int) (int) (int) (int) (int)
# 1 1 2000 0 0 0
# 2 1 2001 1 0 1
# 3 1 2002 0 0 1
# 4 1 2003 0 1 0
# 5 1 2004 0 0 0
# 6 2 2000 0 0 0
# 7 2 2001 0 0 0
# 8 2 2002 1 0 1
# 9 2 2003 0 0 1
# 10 2 2004 0 1 0
答案 1 :(得分:1)
对您提供的代码使用类似的逻辑:
#Load dplyr
require(dplyr)
#Make data
df <- data.frame("id" = c(1,1,1,1,1,2,2,2,2,2),
"year" = c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004),
"start" = c(0,1,0,0,0,0,0,1,0,0),
"end" = c(0,0,0,1,0,0,0,0,0,1))
#Order by Year and ID
df <- df[order(df$id,df$year),]
#Make new variable
df$duration <- 0
df$duration[df$start==1 & df$end != 1] <- 1
df$duration[lag(df$duration,1)==1 & df$end ==0] <-1
答案 2 :(得分:1)
我们可以使用base R
df1$duration <- with(df1, ave(start-end, id, FUN = cumsum))
df1
# id year start end duration
#1 1 2000 0 0 0
#2 1 2001 1 0 1
#3 1 2002 0 0 1
#4 1 2003 0 1 0
#5 1 2004 0 0 0
#6 2 2000 0 0 0
#7 2 2001 0 0 0
#8 2 2002 1 0 1
#9 2 2003 0 0 1
#10 2 2004 0 1 0