df
Patient_ID ADM_DATE DIS_DATE
278328 4/17/2007 4/19/2007
279347 2/6/2012 2/7/2012
279347 2/28/2012 3/3/2012
287171 1/11/2012 1/14/2012
287171 1/23/2013 2/4/2013
353079 7/12/2011 7/15/2011
608639 10/5/2010 10/7/2010
608639 2/16/2012 2/19/2012
608639 5/2/2012 5/4/2012
608639 11/27/2012 12/4/2012
我需要找到给定患者下次入院的时间。我需要添加一个新列,从每个Patient_ID的DIS_DATE(上一行)中减去ADM_DATE。
我的最终产品应该是这样的
df1
Patient_ID ADM_DATE DIS_DATE Time_to_readmission
278328 4/17/2007 4/19/2007 NA
279347 2/6/2012 2/7/2012 NA
279347 2/28/2012 3/3/2012 21
287171 1/11/2012 1/14/2012 NA
287171 1/23/2013 2/4/2013 375
353079 7/12/2011 7/15/2011 NA
608639 10/5/2010 10/7/2010 NA
608639 2/16/2012 2/19/2012 497
608639 5/2/2012 5/4/2012 73
608639 11/27/2012 12/4/2012 207
请帮我完成所需的编码,我对R比较新。先谢谢。
答案 0 :(得分:3)
这是一个快速data.table
实施。首先,我们将转换为正确的Date
类,然后我们将ADM_DATE - shift(DIS_DATE)
运行Patient_ID
并将其包装到as.integer
(或不),因为它似乎你想要一个integer
类而不是difftime
。
library(data.table)
setDT(df)[, 2:3 := lapply(.SD, as.IDate, "%m/%d/%Y"), .SDcols = -1]
df[, Diff := as.integer(ADM_DATE - shift(DIS_DATE)), by = Patient_ID]
df
# Patient_ID ADM_DATE DIS_DATE Diff
# 1: 278328 2007-04-17 2007-04-19 NA
# 2: 279347 2012-02-06 2012-02-07 NA
# 3: 279347 2012-02-28 2012-03-03 21
# 4: 287171 2012-01-11 2012-01-14 NA
# 5: 287171 2013-01-23 2013-02-04 375
# 6: 353079 2011-07-12 2011-07-15 NA
# 7: 608639 2010-10-05 2010-10-07 NA
# 8: 608639 2012-02-16 2012-02-19 497
# 9: 608639 2012-05-02 2012-05-04 73
# 10: 608639 2012-11-27 2012-12-04 207
与dplyr
library(dplyr)
df %>%
mutate_each(funs(as.Date(., "%m/%d/%Y")), -1) %>%
group_by(Patient_ID) %>%
mutate(Diff = as.integer(ADM_DATE - lag(DIS_DATE)))
答案 1 :(得分:0)
好吧,我第一次看到它时误解了,但这次它应该有效。
代码:
LEFT JOIN ospos_people ON ospos_people.person_id = ospos_sales.customer_id
LEFT JOIN ospos_people ON ospos_people.person_id = ospos_sales.vehicle_id
答案 2 :(得分:0)
我首先要格式化data.frame
df[ , 2] <- as.Date(df[ , 2], format = "%m/%d/%Y")
df[ , 3] <- as.Date(df[ , 3], format = "%m/%d/%Y")
然后将其与患者分开(对于每个患有此ID的患者,每个唯一的患者搜索):
dfList <- lapply(unique(df$Patient_ID), function(x) df[which(df$Patient_ID == x), ])
现在dfList是每个项目中包含data.frame的列表。现在检查每个列表项。
dfList2 <- lapply(dfList, function(x){
if (nrow(x)>1){
Diff <- c(NA)
for (i in 2:nrow(x)){
Diff[i] <- difftime(x[i,2], x[i-1,3])
}
cbind(x, Time_to_readmission = Diff)
} else {
cbind(x, Time_to_readmission = NA)
}
})
现在把它全部放在一起:
do.call("rbind", dfList2)
这可能不是最优雅的方式,但我认为它有效并且可以理解。