目标是计算按ID分组的事件之间的时间。这是一个示例:
library(data.table)
library(lubridate)
dt <- data.table(id = c(1,1:3),
start = c("2015-01-01 12:00:00", "2015-12-01 12:00:00", "2019-01-01 12:00:00", NA),
end = c("2016-01-01 12:00:01", "2016-01-01 12:00:01", "2019-01-01 12:00:01", "2019-01-01 12:00:02"))
dt[, start := ymd_hms(start)]
dt[, end := ymd_hms(end)]
dt[, time_diff_1 := min(end) - max(start), by = .(id)]
dt[, time_diff_2 := end - start]
结果为:
id start end time_diff_1 time_diff_2
1: 1 2015-01-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 31536001 secs
2: 1 2015-12-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 2678401 secs
3: 2 2019-01-01 12:00:00 2019-01-01 12:00:01 1.00000 secs 1 secs
4: 3 <NA> 2019-01-01 12:00:02 NA secs NA secs
列time_diff_1
和time_diff_2
均以秒为单位显示时差。但是,根据分组计算得出的time_diff_1
将单位混合在一起。 id == 1
的结果是31天零一秒。似乎是按组自动选择了单位,然后被覆盖了。
关于如何解决此问题的任何提示?
答案 0 :(得分:0)
使用difftime()
功能时,可以显式设置单位,例如
dt[, time_diff_3 := difftime(min(end), max(start), units = "secs"), by = .(id)]
导致
id start end time_diff_1 time_diff_2 time_diff_3
1: 1 2015-01-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 31536001 secs 2678401 secs
2: 1 2015-12-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 2678401 secs 2678401 secs
3: 2 2019-01-01 12:00:00 2019-01-01 12:00:01 1.00000 secs 1 secs 1 secs
4: 3 <NA> 2019-01-01 12:00:02 NA secs NA secs NA secs
,预期结果在列time_diff_3
中。
但是,在分组计算之后,data.table
静默如何覆盖单位仍可能有改进的余地。结果导致我头部有些划伤,然后我才发现这些单元弄乱了。