我想检查一个人的资格状况是否有任何差距。我将间隙定义为在最后一次elig_end_date之后30天发生的date_of_claim。因此,我想要做的是检查每个date_of_claim是否不超过紧接在前一行中的elig_end_date + 30days。理想情况下,我想要一个指示0表示没有间隙的指标,如果每个人存在间隙并且存在间隙,则指示1。这是一个示例df,其解决方案内置为“gap”。
names date_of_claim elig_end_date obs gaps
1 tom 2010-01-01 2010-07-01 1 NA
2 tom 2010-05-04 2010-07-01 1 0
3 tom 2010-06-01 2014-01-01 2 0
4 tom 2010-10-10 2014-01-01 2 0
5 mary 2010-03-01 2014-06-14 1 NA
6 mary 2010-05-01 2014-06-14 1 0
7 mary 2010-08-01 2014-06-14 1 0
8 mary 2010-11-01 2014-06-14 1 0
9 mary 2011-01-01 2014-06-14 1 0
10 john 2010-03-27 2011-03-01 1 NA
11 john 2010-07-01 2011-03-01 1 0
12 john 2010-11-01 2011-03-01 1 0
13 john 2011-02-01 2011-03-01 1 0
14 sue 2010-02-01 2010-04-30 1 NA
15 sue 2010-02-27 2010-04-30 1 0
16 sue 2010-03-13 2010-05-31 2 0
17 sue 2010-04-27 2010-06-30 3 0
18 sue 2010-04-27 2010-06-30 3 0
19 sue 2010-05-06 2010-08-31 4 0
20 sue 2010-06-08 2010-09-30 5 0
21 mike 2010-05-01 2010-07-30 1 NA
22 mike 2010-06-01 2010-07-30 1 0
23 mike 2010-11-12 2011-07-30 2 1
我发现这篇文章非常有用How can I compare a value in a column to the previous one using R?,但觉得我不能使用循环,因为我的df有400万行,而且我已经遇到了很多困难,试图在它上面运行循环。
为此,我认为我需要的代码是这样的:
df$gaps<-ifelse(df$date_of_claim>=df$elig_end_date+30,1,0) ##this doesn't use the preceeding row.
我用这个做了一个笨拙的尝试:
df$gaps<-df$date_of_claim>=df$elig_end_date[-1,]
但我得到一个错误,说我的维数不正确。
所有人都非常感谢!谢谢。答案 0 :(得分:1)
我将使用data.table:
进行四百万次观察DF <- read.table(text="names date_of_claim elig_end_date obs gaps
1 tom 2010-01-01 2010-07-01 1 NA
2 tom 2010-05-04 2010-07-01 1 0
3 tom 2010-06-01 2014-01-01 2 0
4 tom 2010-10-10 2014-01-01 2 0
5 mary 2010-03-01 2014-06-14 1 NA
6 mary 2010-05-01 2014-06-14 1 0
7 mary 2010-08-01 2014-06-14 1 0
8 mary 2010-11-01 2014-06-14 1 0
9 mary 2011-01-01 2014-06-14 1 0
10 john 2010-03-27 2011-03-01 1 NA
11 john 2010-07-01 2011-03-01 1 0
12 john 2010-11-01 2011-03-01 1 0
13 john 2011-02-01 2011-03-01 1 0
14 sue 2010-02-01 2010-04-30 1 NA
15 sue 2010-02-27 2010-04-30 1 0
16 sue 2010-03-13 2010-05-31 2 0
17 sue 2010-04-27 2010-06-30 3 0
18 sue 2010-04-27 2010-06-30 3 0
19 sue 2010-05-06 2010-08-31 4 0
20 sue 2010-06-08 2010-09-30 5 0
21 mike 2010-05-01 2010-07-30 1 NA
22 mike 2010-06-01 2010-07-30 1 0
23 mike 2010-11-12 2011-07-30 2 1", header=TRUE)
library(data.table)
DT <- data.table(DF)
DT[, c("date_of_claim", "elig_end_date") := list(as.Date(date_of_claim), as.Date(elig_end_date))]
DT[, gaps2:= c(NA, date_of_claim[-1] > head(elig_end_date, -1)+30), by=names]
# names date_of_claim elig_end_date obs gaps gaps2
# 1: tom 2010-01-01 2010-07-01 1 NA NA
# 2: tom 2010-05-04 2010-07-01 1 0 FALSE
# 3: tom 2010-06-01 2014-01-01 2 0 FALSE
# 4: tom 2010-10-10 2014-01-01 2 0 FALSE
# 5: mary 2010-03-01 2014-06-14 1 NA NA
# 6: mary 2010-05-01 2014-06-14 1 0 FALSE
# 7: mary 2010-08-01 2014-06-14 1 0 FALSE
# 8: mary 2010-11-01 2014-06-14 1 0 FALSE
# 9: mary 2011-01-01 2014-06-14 1 0 FALSE
# 10: john 2010-03-27 2011-03-01 1 NA NA
# 11: john 2010-07-01 2011-03-01 1 0 FALSE
# 12: john 2010-11-01 2011-03-01 1 0 FALSE
# 13: john 2011-02-01 2011-03-01 1 0 FALSE
# 14: sue 2010-02-01 2010-04-30 1 NA NA
# 15: sue 2010-02-27 2010-04-30 1 0 FALSE
# 16: sue 2010-03-13 2010-05-31 2 0 FALSE
# 17: sue 2010-04-27 2010-06-30 3 0 FALSE
# 18: sue 2010-04-27 2010-06-30 3 0 FALSE
# 19: sue 2010-05-06 2010-08-31 4 0 FALSE
# 20: sue 2010-06-08 2010-09-30 5 0 FALSE
# 21: mike 2010-05-01 2010-07-30 1 NA NA
# 22: mike 2010-06-01 2010-07-30 1 0 FALSE
# 23: mike 2010-11-12 2011-07-30 2 1 TRUE
# names date_of_claim elig_end_date obs gaps gaps2