我有以下数据,其中每一行对应一个旅行的家庭成员。 由于我们讨论的是家庭成员,因此这些行可能会有重叠的时间 第1行和第2行。以分钟为单位记录行程的持续时间。 IDX只是一个索引,可以使转换可追溯。
IDX | ID | Trip | StartDateTime | Duration (in minutes)
1 | 1 | 1 | 2015-01-21 13:00 | 100
2 | 1 | 1 | 2015-01-21 13:00 | 184
3 | 1 | 1 | 2015-01-21 10:00 | 91
4 | 1 | 2 | 2015-01-22 13:00 | 30
5 | 2 | 2 | 2015-01-30 23:00 | 100
现在我想将每个id,trip,day的数据分成小时数据,如下所示:
IDX | ID | Trip | StartDateTime | Duration (in minutes)
1 | 1 | 1 | 2015-01-21 13:00 | 60
1 | 1 | 1 | 2015-01-21 14:00 | 40
请注意,此组的总持续时间仍为100,与第一行类似。第二, IDX取自第一行。但是对于第4排,我们没有超过60分钟 那个人不会分裂。所得:
IDX | ID | Trip | StartDateTime | Duration (in minutes)
4 | 1 | 2 | 2015-01-22 13:00 | 25
现在最艰难的问题就变成了第五排,那一天实际上已经过了一天! 这样一来就会成为:
IDX | ID | Trip | StartDateTime | Duration (in minutes)
5 | 2 | 2 | 2015-01-30 23:00 | 60
5 | 2 | 2 | 2015-01-31 0:00 | 40
是否可以扩展这样的表?
构建表格的代码:
library(data.table)
data.table(IDX = c(1:5),
ID = c(1,1,1,2,2),
Trip = c(1,1,1,1,2),
StartDateTime = strptime(c("2015-01-21 13:00","2015-01-21 13:00","2015-01-21 10:00","2015-01-22 13:00","2015-01-30 23:00"), format="%Y-%m-%d %H:%M"),
Duration = c(100,184,91,30,100)
)
更新 起始时间可以是13:12,但我对起始时间并不感兴趣,实际上每小时一次。
因此,如果开始时间不等于整个小时,例如:
IDX | ID | Trip | StartDateTime | Duration (in minutes)
6 | 3 | 1 | 2015-01-30 23:14 | 67
然后我们得到:
IDX | ID | Trip | StartDateTime | Duration (in minutes)
6 | 3 | 1 | 2015-01-30 23:00 | 46
6 | 3 | 1 | 2015-01-31 0:00 | 11
我很抱歉没有澄清这一部分,但我认为这是eddi解决方案中的一个简单的后处理步骤。
由于
答案 0 :(得分:3)
这与@ eddi的答案非常相似,但使用基础<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<title>Error 500 Request failed.</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /ids-rest-api/password/reset. Reason:
<pre> Request failed.</pre></p><hr><i><small>Powered by Jetty://</small> </i><hr/>
</body>
</html>
而不是luridate函数:
difftime
给出了
# modifying the example:
DT[1, StartDateTime := as.POSIXct("2015-01-21 13:12")]
DT[,{
t0 = StartDateTime
t1 = StartDateTime + Duration*60
h0 = trunc(t0, units="hour")
h1 = trunc(t1, units="hour")
h = seq(h0, h1, by="hour")
nh = length(h)
dur = as.difftime(rep("1",nh), format="%H", units="mins")
if (h0 < t0) dur[1 ] = difftime(h0 + as.difftime("1", format="%H", units="mins"), t0)
if (h1 < t1) dur[nh] = difftime(t1, h1)
if (h0 == h1) dur = difftime(t1, t0)
list(h = h, dur = dur)
}, by=.(IDX, ID, Trip)]
答案 1 :(得分:2)
dt[, .(IDX, ID, Trip,
StartDateTime = StartDateTime + 60*seq(0, Duration, 60),
Duration = diff(c(seq(0, Duration, 60), Duration)))
, by = 1:nrow(dt)]
# nrow IDX ID Trip StartDateTime Duration
# 1: 1 1 1 1 2015-01-21 13:00:00 60
# 2: 1 1 1 1 2015-01-21 14:00:00 40
# 3: 2 2 1 1 2015-01-21 13:00:00 60
# 4: 2 2 1 1 2015-01-21 14:00:00 60
# 5: 2 2 1 1 2015-01-21 15:00:00 60
# 6: 2 2 1 1 2015-01-21 16:00:00 4
# 7: 3 3 1 1 2015-01-21 10:00:00 60
# 8: 3 3 1 1 2015-01-21 11:00:00 31
# 9: 4 4 2 1 2015-01-22 13:00:00 30
#10: 5 5 2 2 2015-01-30 23:00:00 60
#11: 5 5 2 2 2015-01-31 00:00:00 40
以下是非圆形小时的修改:
dt[5, StartDateTime := StartDateTime + 14*60]
library(lubridate)
dt[, {dur = diff(c(minute(StartDateTime),
tail(seq(0, Duration, 60), -1),
Duration + minute(StartDateTime)))
list(StartDateTime = floor_date(StartDateTime, "hour") + (seq_along(dur)-1)*3600,
Duration = dur)}
, by = .(IDX, ID, Trip)]
# IDX ID Trip StartDateTime Duration
# 1: 1 1 1 2015-01-21 13:00:00 60
# 2: 1 1 1 2015-01-21 14:00:00 40
# 3: 2 1 1 2015-01-21 13:00:00 60
# 4: 2 1 1 2015-01-21 14:00:00 60
# 5: 2 1 1 2015-01-21 15:00:00 60
# 6: 2 1 1 2015-01-21 16:00:00 4
# 7: 3 1 1 2015-01-21 10:00:00 60
# 8: 3 1 1 2015-01-21 11:00:00 31
# 9: 4 2 1 2015-01-22 13:00:00 30
#10: 5 2 2 2015-01-30 23:00:00 46
#11: 5 2 2 2015-01-31 00:00:00 54