使用包含NULL值的“date”列处理data.frame

时间:2013-05-26 11:45:12

标签: r dataframe

我想比较两个'日期字符串'列,如:

df$inpatient.death = (df$date.of.death==df$date.of.discharge)

但是:NULL值的出现似乎阻止我格式化as.Date,并且使用as.character(..)== as.character(..)的格式不同。 什么是创建

的最佳方式
                                                    THIS IS THE AIM:
  id           date.of.death date.of.discharge    [ inpatient.death ]
1  1 2012-01-01 00:00:00.000        2012-01-01    [            TRUE ]
2  2                    NULL        2012-01-01    [           FALSE ]
3  3 2012-01-02 00:00:00.000        2012-01-01    [           FALSE ]

df <- data.frame(id=1:3, date.of.death=c("2012-01-01 00:00:00.000", "NULL", "2012-01-02 00:00:00.000"), date.of.discharge=c("2012-01-01", "2012-01-01", "2012-01-01"))

这样做的最佳方式是什么?

1 个答案:

答案 0 :(得分:1)

df <- data.frame(id=1:3, date.of.death=c("2012-01-01 00:00:00.000", "NULL", "2012-01-02 00:00:00.000"),
                 date.of.discharge=c("2012-01-01", "2012-01-01", "2012-01-01"))

df$inpatient.death <- as.Date(df$date.of.death)==as.Date(df$date.of.discharge) # date.of.death is already in the standard format no need to specify
df$inpatient.death[is.na(df$inpatient.death)] <- F

> df
  id           date.of.death date.of.discharge inpatient.death
1  1 2012-01-01 00:00:00.000        2012-01-01            TRUE
2  2                    NULL        2012-01-01           FALSE
3  3 2012-01-02 00:00:00.000        2012-01-01           FALSE

# you can also definy an helper function for this task

`==2` <- function(x,y){
  res <- x==y
  res[is.na(res)] <- F
  res
}

df$inpatient.death <- `==2`(as.Date(df$date.of.death),as.Date(df$date.of.discharge))

> df
  id           date.of.death date.of.discharge inpatient.death
1  1 2012-01-01 00:00:00.000        2012-01-01            TRUE
2  2                    NULL        2012-01-01           FALSE
3  3 2012-01-02 00:00:00.000        2012-01-01           FALSE