使用R减去给定值的后续行

时间:2016-01-19 16:09:25

标签: r

df

Patient_ID  ADM_DATE    DIS_DATE
278328  4/17/2007   4/19/2007
279347  2/6/2012    2/7/2012
279347  2/28/2012   3/3/2012
287171  1/11/2012   1/14/2012
287171  1/23/2013   2/4/2013
353079  7/12/2011   7/15/2011
608639  10/5/2010   10/7/2010
608639  2/16/2012   2/19/2012
608639  5/2/2012    5/4/2012
608639  11/27/2012  12/4/2012

我需要找到给定患者下次入院的时间。我需要添加一个新列,从每个Patient_ID的DIS_DATE(上一行)中减去ADM_DATE。

我的最终产品应该是这样的

df1

Patient_ID  ADM_DATE    DIS_DATE    Time_to_readmission 
278328  4/17/2007   4/19/2007   NA
279347  2/6/2012    2/7/2012    NA
279347  2/28/2012   3/3/2012    21
287171  1/11/2012   1/14/2012   NA
287171  1/23/2013   2/4/2013    375
353079  7/12/2011   7/15/2011   NA
608639  10/5/2010   10/7/2010   NA
608639  2/16/2012   2/19/2012   497
608639  5/2/2012    5/4/2012    73
608639  11/27/2012  12/4/2012   207

请帮我完成所需的编码,我对R比较新。先谢谢。

3 个答案:

答案 0 :(得分:3)

这是一个快速data.table实施。首先,我们将转换为正确的Date类,然后我们将ADM_DATE - shift(DIS_DATE)运行Patient_ID并将其包装到as.integer(或不),因为它似乎你想要一个integer类而不是difftime

library(data.table)
setDT(df)[, 2:3 := lapply(.SD, as.IDate, "%m/%d/%Y"), .SDcols = -1]
df[, Diff := as.integer(ADM_DATE - shift(DIS_DATE)), by = Patient_ID]
df
#     Patient_ID   ADM_DATE   DIS_DATE Diff
#  1:     278328 2007-04-17 2007-04-19   NA
#  2:     279347 2012-02-06 2012-02-07   NA
#  3:     279347 2012-02-28 2012-03-03   21
#  4:     287171 2012-01-11 2012-01-14   NA
#  5:     287171 2013-01-23 2013-02-04  375
#  6:     353079 2011-07-12 2011-07-15   NA
#  7:     608639 2010-10-05 2010-10-07   NA
#  8:     608639 2012-02-16 2012-02-19  497
#  9:     608639 2012-05-02 2012-05-04   73
# 10:     608639 2012-11-27 2012-12-04  207

dplyr

相同的想法
library(dplyr)
df %>%
  mutate_each(funs(as.Date(., "%m/%d/%Y")), -1) %>%
  group_by(Patient_ID) %>%
  mutate(Diff = as.integer(ADM_DATE - lag(DIS_DATE)))

答案 1 :(得分:0)

好吧,我第一次看到它时误解了,但这次它应该有效。

代码:

LEFT JOIN ospos_people ON ospos_people.person_id = ospos_sales.customer_id
LEFT JOIN ospos_people ON ospos_people.person_id = ospos_sales.vehicle_id 

答案 2 :(得分:0)

我首先要格式化data.frame

df[ , 2] <- as.Date(df[ , 2], format = "%m/%d/%Y")
df[ , 3] <- as.Date(df[ , 3], format = "%m/%d/%Y")

然后将其与患者分开(对于每个患有此ID的患者,每个唯一的患者搜索):

dfList <- lapply(unique(df$Patient_ID), function(x) df[which(df$Patient_ID == x), ])

现在dfList是每个项目中包含data.frame的列表。现在检查每个列表项。

dfList2 <- lapply(dfList, function(x){
  if (nrow(x)>1){
    Diff <- c(NA)
    for (i in 2:nrow(x)){
        Diff[i] <- difftime(x[i,2], x[i-1,3])
    }
    cbind(x, Time_to_readmission = Diff)
  } else {
    cbind(x, Time_to_readmission = NA)
  }
})

现在把它全部放在一起:

do.call("rbind", dfList2)

这可能不是最优雅的方式,但我认为它有效并且可以理解。