R:data.table在实际存在时返回一个空行

时间:2018-12-08 11:19:15

标签: r data.table

我有包含日期​​的数据。

newdata <- data.table(example.dates)

> newdata
       start_date  paid_date
    1: 2014-08-01 2015-09-24
    2: 2015-08-01 2015-10-22
    3: 2015-10-01 2015-12-45
    4: 2015-11-01 2016-03-23
    5: 2016-12-01 2017-02-06
   ---                      
  100: 2018-02-05 2018-04-28
  101: 2018-03-02 2018-07-18
  102: 2018-06-14 2018-10-13
  103: 2018-08-16 2018-11-04
  104: 2018-10-19 2018-11-22

我有一个函数,可以计算月份之间的日期差

difference_month <- function(new_date, old_date) {
  start_date <- old_date %>% as.Date() %>% as.yearmon()
  end_date <- new_date %>% as.Date() %>% as.yearmon()
  diff_mon <- (end_date - start_date) * 12 
  return(diff_mon)
}

并在newdata表中添加了'diff'列。

newdata[,diff := difference_month(paid_date,start_date)]

> newdata
      start_date  paid_date diff
    1: 2014-08-01 2015-09-24  13
    2: 2015-08-01 2015-10-22  2
    3: 2015-10-01 2015-12-45  2
    4: 2015-11-01 2016-03-23  4
    5: 2016-12-01 2017-02-06  2
   ---                      
  100: 2018-02-05 2018-04-28  2
  101: 2018-03-02 2018-07-18  4
  102: 2018-06-14 2018-10-13  4
  103: 2018-08-16 2018-11-04  3
  104: 2018-10-19 2018-11-22  1

但是,当我想查看相差2个月的行时,就会出现这种情况。

> newdata[diff == 2]
Empty data.table (0 rows) of 3 cols: start_date,paid_date,diff

但是,当我选择包含2个月差异的行并使用它来查找包含2个月差异的整个行时,此方法就起作用。

x <- newdata[2][[3]]

> newdata[diff == x]
  start_date  paid_date diff
1: 2015-08-01 2015-10-22  2
2: 2015-10-01 2015-12-45  2
3: 2016-12-01 2017-02-06  2                      
4: 2018-02-05 2018-04-28  2

我检查了str(),并且'diff'是数字形式。

为什么当实际存在2个月的差异时,这会返回空行?

newdata[diff == 2]

1 个答案:

答案 0 :(得分:0)

在注释的末尾使用newdata,我们注意到floating calculations can produce rounding error在末尾进行四舍五入,如下所示。另外请注意,as.yearmon可以直接转换日期列,因此不需要as.Date

library(data.table)
library(zoo)

newdata[, diff := round(12 * (as.yearmon(paid_date) - as.yearmon(start_date)))]
newdata[diff == 2]

给予:

   start_date  paid_date diff
1: 2015-08-01 2015-10-22    2
2: 2016-12-01 2017-02-06    2
3: 2018-02-05 2018-04-28    2

注意

可复制形式的输入:

Lines <- "
start_date  paid_date
2014-08-01 2015-09-24
2015-08-01 2015-10-22
2015-10-01 2015-12-45
2015-11-01 2016-03-23
2016-12-01 2017-02-06
2018-02-05 2018-04-28
2018-03-02 2018-07-18
2018-06-14 2018-10-13
2018-08-16 2018-11-04
2018-10-19 2018-11-22"

library(data.table)
newdata <- fread(Lines)