R的再入院率

时间:2018-01-18 22:19:16

标签: r

我想建立在几个月前被问到的这个问题上 __import__()

我有一些患者入院数据,我希望检查患者入院状态,看看是否是再入院。如果患者的状况是住院病人,我想回顾30天,看看是否还有其他住院患者。如果有另一个住院病人的遭遇,我想填写一个30天的栏目,说是,否则填写否,如果没有。我有一些关于我需要的示例数据。每位患者都有一个独特的MRN号码,该号码永远不会改变,但每次访问可能会有很多遭遇号码。 “df2”是我在评估“df”后想要创建的。任何帮助表示赞赏。我不确定以前的帖子是否会使用类似的设置。

csn <- c("11111","22222","33333","44444","55555","66666","77777")
mrn <- c("44322","81433","56311","44322","55121","61776","44322")
admit_date <- c("2017-02-01","2017-02-02","2017-02-04","2017-02-10","2017-02-12","2017-02-14","2017-02-18")
disch_date <- c("2017-02-03","2017-02-04","2017-02-04","2017-02-10","2017-02-16","2017-02-14","2017-02-25")
encounter_type <-c("Inpatient","Inpatient","Observation","ER","Inpatient","Observation","Inpatient")
readmission_30day <- c("no","no","no","no","no","no","yes")
df <- data.frame(csn,mrn,admit_date,disch_date,encounter_type)
df2 <- data.frame(csn,mrn,admit_date,disch_date,encounter_type,readmission_30day)

df
df2

1 个答案:

答案 0 :(得分:1)

好的,我有一个使用dplyr包的答案。我会尝试解释发生了什么,但你可能需要阅读一下。如果您不知道管道操作符(%&gt;%) - 只需将其读作“then”。

# we need 2 libraries
library(tidyverse) # this provides several useful packages
library(lubridate) # this lets you deal more easily with dates

# first, we create a table that contains the csn of the relevant cases
# and the days since last admission

parkDf <- df %>% # we take our data frame
    mutate(admit_date = as_date(admit_date)) %>% # turn the dates into date format
    filter(encounter_type == "Inpatient") %>% # filter out cases with "Inpatient"
    arrange(mrn, admit_date) %>% # sort them first by mrn and then by admit_date
    group_by(mrn) %>% # group them by mrn so we can for each patient...
    mutate(daysSinceLastAdmit = admit_date - lag(admit_date)) %>% # ...get the days since last admit
    mutate(daysSinceLastAdmit = as.integer(daysSinceLastAdmit)) %>% # turn this into an integer
    ungroup() %>% # ungroup it (must be done - don't ask)
    select(csn, daysSinceLastAdmit) # and keep only these two columns

# now we left-join this to our original dataframe
df %>%
    left_join(parkDf, by = "csn") %>%
    mutate(readmission_30day = if_else(is.na(daysSinceLastAdmit), "no", "yes")) %>% # create the wanted variable
    select(-daysSinceLastAdmit) # and remove the unwanted one

如果你谷歌“R for data science”或“r4ds”,你可以阅读更多关于使用函数的内容 - 这是一本由Hadley Wickham撰写的书,是一位伟大的作者。如果您不知道左连接是什么,只需谷歌“sql left join”。它基本上是:取右边的(我们新创建的数据帧),将其信息添加到左边的内容(原始数据帧),并通过匹配两个数据帧(by)中存在的列来实现。

希望这有帮助。