我有药房索赔数据,按病人列出开始和结束填写日期。为了便于计算,我想记录一个真实的(1)或假(0)日记,记录每个患者是否有某一天记录的日期。
使用下面的示例数据,我试图分析在1/1 / 2013-1 / 10/2013的规定十天期间的观察结果。
我玩过?seqdate
Patient_ID Start_Date End_Date
a 1/1/2013 1/3/2013
b 1/3/2013 1/8/2013
c 1/1/2013 1/10/2013
d 1/7/2013 1/9/2013
a 1/8/2013 1/9/2013
a b c d
1/1/2013 1 0 1 0
1/2/2013 1 0 1 0
1/3/2013 1 1 1 0
1/4/2013 0 1 1 0
1/5/2013 0 1 1 0
1/6/2013 0 1 1 0
1/7/2013 0 1 1 1
1/8/2013 1 1 1 1
1/9/2013 1 0 1 1
1/10/2013 0 0 1 0
答案 0 :(得分:5)
尝试
library(data.table)
res <- setDT(df1)[, seq(as.Date(Start_Date, '%m/%d/%Y'),
as.Date(End_Date, '%m/%d/%Y'), by='day'), by=list(Patient_ID,
1:nrow(df1))]
table(res[,c(3,1), with=FALSE])
或仅使用base R
lst <- Map(seq, as.Date(df1$Start_Date, '%m/%d/%Y'),
as.Date(df1$End_Date, '%m/%d/%Y'), by='day')
lst <- lapply(lst, format, '%m/%d/%Y')
table(unlist(lst), rep(df1$Patient_ID,lengths(lst)))
# a b c d
# 01/01/2013 1 0 1 0
# 01/02/2013 1 0 1 0
# 01/03/2013 1 1 1 0
# 01/04/2013 0 1 1 0
# 01/05/2013 0 1 1 0
# 01/06/2013 0 1 1 0
# 01/07/2013 0 1 1 1
# 01/08/2013 1 1 1 1
# 01/09/2013 1 0 1 1
# 01/10/2013 0 0 1 0
df1 <- structure(list(Patient_ID = c("a", "b", "c", "d", "a"),
Start_Date = c("1/1/2013",
"1/3/2013", "1/1/2013", "1/7/2013", "1/8/2013"), End_Date =
c("1/3/2013",
"1/8/2013", "1/10/2013", "1/9/2013", "1/9/2013")),
.Names = c("Patient_ID",
"Start_Date", "End_Date"), class = "data.frame",
row.names = c(NA, -5L))