根据三个日期列的不等式创建0和1列

时间:2017-03-23 20:59:37

标签: r survival-analysis

我想基于三列日期的不等式创建一个0和1的列。

这个想法如下。如果event_date位于death_datestudy_over之前,则event列应为== 1,如果event_date发生在death_date之后或study_over 1}},event应为== 0. event_datedeath_date都可能包含NA。

set.seed(1337)
rand_dates <- Sys.Date() - 365:1

df <- 
data.frame(
   event_date = sample(rand_dates, 20),
   death_date = sample(rand_dates, 20),
   study_over = sample(rand_dates, 20)
)

我的尝试是以下

eventR <- 
    function(x, y, z){
    if(is.na(y)){
        ifelse(x <= z, 1, 0)
    } else if(y <= z){
        ifelse(x < y, 1, 0)
    } else {
        ifelse(x <= z, 1, 0)
    }
    }

我以下列方式使用它

library(dplyr)
df[c(3, 5, 7), "event_date"] <- NA #there are some NA in .$event_date
df[c(3, 4, 6), "death_date"] <- NA #there are some NA in .$death_date

df %>%
mutate(event = sapply(.$event_date, eventR, y = .$death_date, z = .$study_over))
##Error: wrong result size (400), expected 20 or 1
##In addition: There were 40 warnings (use warnings() to see them)

我无法弄清楚如何做到这一点。有什么建议吗?

2 个答案:

答案 0 :(得分:3)

这似乎构建了一个二进制列(在需要时使用NA),其中1表示“event_date在death_date或study_over之前”,0在其他地方使用。正如已经指出的那样,您的规范并未涵盖所有情况:

df$event <- with(df, as.numeric( event_date < pmax( death_date , study_over) ) )
df

答案 1 :(得分:1)

可以使用purrr包中的SELECT Dept_Name, COUNT(vsrv.TicketNbr) AS TotalTicketsSubmitted, SUM(CASE WHEN vsrv.Closed_Flag = 1 THEN 1 ELSE 0 END) AS TotalTicketsClosed, SUM(CASE WHEN vsrv.Closed_Flag = 0 THEN 1 ELSE 0 END) AS TotalOpenTickets FROM ( Select v_rpt_service.* -- all fields other than Dept_Name , CASE Dept_Name WHEN 'Application' THEN 'Group 1' ELSE 'Group 2' END AS Dept_Name FROM v_rpt_service ) vsrv LEFT OUTER JOIN v_rpt_SurveysByTicket vsrvy ON vsrv.TicketNbr = Vsrvy.SR_Service_RecID WHERE Dept_Name in ('Application', 'Support', 'Service', 'Development', 'IT') GROUP BY Dept_Name 而不是sapply ...

pmap_dbl()

您可能也对dplyr函数感兴趣,library(dplyr) library(purrr) df %>% mutate(event = pmap_dbl(list(event_date, death_date, study_over), eventR)) event_date death_date study_over event 1 2016-10-20 2017-01-27 2016-12-16 1 2 2016-10-15 2016-12-12 2017-01-20 1 3 <NA> <NA> 2016-10-09 NA 4 2016-09-04 <NA> 2016-11-17 1 5 <NA> 2016-10-13 2016-06-09 NA 6 2016-07-21 <NA> 2016-04-26 0 7 <NA> 2017-02-21 2016-07-12 NA 8 2016-07-02 2017-02-08 2016-08-24 1 9 2016-06-19 2016-09-07 2016-04-11 0 10 2016-05-14 2017-03-13 2016-08-03 1 11 2017-03-06 2017-02-05 2017-02-28 0 12 2017-03-10 2016-04-28 2016-11-30 0 13 2017-01-10 2016-12-10 2016-10-27 0 14 2016-05-31 2016-06-12 2016-08-13 1 15 2017-03-03 2016-12-25 2016-12-20 0 16 2016-04-01 2016-11-03 2016-06-30 1 17 2017-02-26 2017-02-25 2016-05-12 0 18 2017-02-08 2016-12-08 2016-10-14 0 19 2016-07-19 2016-07-03 2016-09-22 0 20 2016-06-17 2016-06-06 2016-11-09 0 用于处理许多if else语句。