使用R中的data.table执行条件赋值时,使用lubridate格式的错误

时间:2016-05-07 17:36:39

标签: r data.table conditional lubridate

我有一个带有一列日期的data.table。我需要创建一个新列,根据原始日期是在某个日期之前还是之后,添加1年或2年。

library(data.table); library(lubridate)

name  = c("A", "B", "C")
paid  = c("5/30/2016", "6/30/2016", "7/30/2016")
data  = data.table(name,paid)

new_release = mdy("6/1/2017")

data[, paid := mdy(paid)]

data[, change_date:= ifelse(paid + years(1) < new_release, 
                            paid + years(2), paid +years(1)) ]

我得到以下结果。 lubridate无法将其转换为日期。我已经尝试用ifelse包装mdy语句,但这也不起作用。我知道条件语句有效,因为如果用TRUE / FALSE替换分配,它会正确分配值。

> data
   name       paid change_date
1:    A 2016-05-30  1527638400
2:    B 2016-06-30  1498780800
3:    C 2016-07-30  1501372800
> str(data)
Classes ‘data.table’ and 'data.frame':  3 obs. of  3 variables:
 $ name       : chr  "A" "B" "C"
 $ paid       : POSIXct, format: "2016-05-30" "2016-06-30" "2016-07-30"
 $ change_date: num  1.53e+09 1.50e+09 1.50e+09
 - attr(*, ".internal.selfref")=<externalptr>

2 个答案:

答案 0 :(得分:3)

我只是将lubdridate留出来并使用基本日期类型完成所有操作:

library(data.table)

name  <- c("A", "B", "C")
paid  <- as.Date(c("2016-05-30", "2016-06-30", "2016-07-30"))
data  <- data.table(name,paid)

new_release <- as.Date("2017-06-01")
year <- 365.25

data[, change_date:= as.Date(ifelse(paid + year < new_release, 
                                    paid + year*2, 
                                    paid + year)) ]

然后:

R> data[]
   name       paid change_date
1:    A 2016-05-30  2018-05-30
2:    B 2016-06-30  2017-06-30
3:    C 2016-07-30  2017-07-30
R> 

ifelse()data.table的背景下感到奇怪。这是另一种选择:

R> data[, cdate := paid+year ]                              # baseline
R> data[paid + year < new_release, cdate := paid + 2*year]  # 
R> data[]
   name       paid change_date      cdate
1:    A 2016-05-30  2018-05-30 2018-05-30
2:    B 2016-06-30  2017-06-30 2017-06-30
3:    C 2016-07-30  2017-07-30 2017-07-30
R> 

答案 1 :(得分:2)

问题是ifelse剥离了属性,因此删除了日期格式(请参阅:?ifelse)。要重新获取日期格式,您可以将ifelse语句包含在as.Date origin = '1970-01-01'

}

data[, change_date := as.Date(ifelse(paid + years(1) < new_release, 
                                     paid + years(2), 
                                     paid + years(1)), 
                              origin = '1970-01-01')]

给出:

> data
   name       paid change_date
1:    A 2016-05-30  2018-05-30
2:    B 2016-06-30  2017-06-30
3:    C 2016-07-30  2017-07-30

或者之后通过将paid列的类分配到change_date列来更正它:

data[, change_date := ifelse(paid + years(1) < new_release, 
                             paid + years(2), 
                             paid + years(1))]
class(data$change_date) <- class(data$paid)

会给你相同的结果。

实现相同目标的ifelse的替代方案(仍然使用lubridate):

data[, change_date := paid + years(as.numeric((paid + years(1) < new_release) + 1))]

,并提供:

> data
   name       paid change_date
1:    A 2016-05-30  2018-05-30
2:    B 2016-06-30  2017-06-30
3:    C 2016-07-30  2017-07-30