我有一个非常大的数据集,需要将时间间隔分成多个日期以进行进一步分析。
下面是我的数据集示例:
require(data.table)
RawDT = data.table(
TimeStampID = c("4"),
DateTimeFrom = c("2019-02-10 16:28:03"),
DateTimeTo = c("2019-02-12 02:04:03")
)
以下是所需结果:
ResultDT = data.table(
ID = c("1","2","3"),
TimeStampID = c("4","4","4"),
DS = c("2019-02-10","2019-02-11","2019-02-12"),
TimeFrom = c("16:28:03","00:00:00","00:00:00"),
TimeTo = c("23:59:59","23:59:59","02:04:03")
)
有人可以指导我使用哪个函数从RawDT实现ResultDT吗?
答案 0 :(得分:1)
好的,这是边界重复的内容,因此,我鼓励主持人在认为合适的情况下关闭该主题,并删除我的帖子。
但是,我在年初和年末(here)遇到类似的问题(但不完全相同,这就是我要回答的问题),而@Jaap创建了一个很棒的(简洁的)解决方案,逻辑也可以在这里应用,例如:
library(data.table)
RawDT[, `:=` (DateTimeFrom = as.POSIXct(DateTimeFrom), DateTimeTo = as.POSIXct(DateTimeTo))]
RawDT[RawDT[, rep(.I, 1 + as.Date(DateTimeTo) - as.Date(DateTimeFrom))]
][, `:=` (DateTimeFrom = pmax(DateTimeFrom[1], as.POSIXct(paste0(as.Date(DateTimeFrom[1]) + 0:(.N-1), ' 00:00:00'))),
DateTimeTo = pmin(DateTimeTo[.N], as.POSIXct(paste0(as.Date(DateTimeTo[.N]) - (.N-1):0, ' 23:59:59'))))
, by = .(TimeStampID, rleid(DateTimeFrom))][]
我已经向您的DT
添加了另一个组,只是为了测试功能:
RawDT = data.table(
TimeStampID = c("4", "5"),
DateTimeFrom = c("2019-02-10 16:28:03", "2019-03-15 12:28:03"),
DateTimeTo = c("2019-02-12 02:04:03", "2019-03-20 14:45:00")
)
上述代码的输出为:
TimeStampID DateTimeFrom DateTimeTo
1: 4 2019-02-10 16:28:03 2019-02-10 23:59:59
2: 4 2019-02-11 00:00:00 2019-02-11 23:59:59
3: 4 2019-02-12 00:00:00 2019-02-12 02:04:03
4: 5 2019-03-15 12:28:03 2019-03-15 23:59:59
5: 5 2019-03-16 00:00:00 2019-03-16 23:59:59
6: 5 2019-03-17 00:00:00 2019-03-17 23:59:59
7: 5 2019-03-18 00:00:00 2019-03-18 23:59:59
8: 5 2019-03-19 00:00:00 2019-03-19 23:59:59
9: 5 2019-03-20 00:00:00 2019-03-20 14:45:00