将数据表拆分为R

时间:2015-10-08 18:18:10

标签: r data.table

我有以下数据,其中每一行对应一个旅行的家庭成员。 由于我们讨论的是家庭成员,因此这些行可能会有重叠的时间 第1行和第2行。以分钟为单位记录行程的持续时间。 IDX只是一个索引,可以使转换可追溯。

IDX  | ID   | Trip |   StartDateTime    | Duration (in minutes)
1    |  1   |  1   |  2015-01-21 13:00  | 100
2    |  1   |  1   |  2015-01-21 13:00  | 184
3    |  1   |  1   |  2015-01-21 10:00  | 91
4    |  1   |  2   |  2015-01-22 13:00  | 30
5    |  2   |  2   |  2015-01-30 23:00  | 100

现在我想将每个id,trip,day的数据分成小时数据,如下所示:

IDX |  ID   | Trip |   StartDateTime      | Duration (in minutes)
1   |  1    |  1   |  2015-01-21 13:00    | 60
1   |  1    |  1   |  2015-01-21 14:00    | 40

请注意,此组的总持续时间仍为100,与第一行类似。第二, IDX取自第一行。但是对于第4排,我们没有超过60分钟 那个人不会分裂。所得:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
4    |  1   |  2   |  2015-01-22 13:00    | 25

现在最艰难的问题就变成了第五排,那一天实际上已经过了一天! 这样一来就会成为:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
5    |  2   |  2   |  2015-01-30 23:00    | 60
5    |  2   |  2   |  2015-01-31 0:00     | 40

是否可以扩展这样的表?

构建表格的代码:

library(data.table)

data.table(IDX = c(1:5),
           ID  = c(1,1,1,2,2),
           Trip = c(1,1,1,1,2),
           StartDateTime = strptime(c("2015-01-21 13:00","2015-01-21 13:00","2015-01-21 10:00","2015-01-22 13:00","2015-01-30 23:00"), format="%Y-%m-%d %H:%M"),
           Duration = c(100,184,91,30,100)
)

更新 起始时间可以是13:12,但我对起始时间并不感兴趣,实际上每小时一次。

因此,如果开始时间不等于整个小时,例如:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
6    |  3   |  1   |  2015-01-30 23:14    | 67
然后我们得到:

IDX  | ID   | Trip |   StartDateTime      | Duration (in minutes)
6    |  3   |  1   |  2015-01-30 23:00    | 46
6    |  3   |  1   |  2015-01-31 0:00     | 11

我很抱歉没有澄清这一部分,但我认为这是eddi解决方案中的一个简单的后处理步骤。

由于

2 个答案:

答案 0 :(得分:3)

这与@ eddi的答案非常相似,但使用基础<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> <title>Error 500 Request failed.</title> </head> <body><h2>HTTP ERROR 500</h2> <p>Problem accessing /ids-rest-api/password/reset. Reason: <pre> Request failed.</pre></p><hr><i><small>Powered by Jetty://</small> </i><hr/> </body> </html> 而不是luridate函数:

difftime

给出了

# modifying the example:
DT[1, StartDateTime := as.POSIXct("2015-01-21 13:12")]

DT[,{
    t0  = StartDateTime
    t1  = StartDateTime + Duration*60

    h0  = trunc(t0, units="hour") 
    h1  = trunc(t1, units="hour") 
    h   = seq(h0, h1, by="hour")
    nh  = length(h)     

    dur = as.difftime(rep("1",nh), format="%H", units="mins")
    if (h0 <  t0) dur[1 ] = difftime(h0 + as.difftime("1", format="%H", units="mins"), t0)
    if (h1 <  t1) dur[nh] = difftime(t1, h1)
    if (h0 == h1) dur     = difftime(t1, t0)

    list(h = h, dur = dur)
}, by=.(IDX, ID, Trip)]

答案 1 :(得分:2)

dt[, .(IDX, ID, Trip,
       StartDateTime = StartDateTime + 60*seq(0, Duration, 60),
       Duration = diff(c(seq(0, Duration, 60), Duration)))
   , by = 1:nrow(dt)]
#    nrow IDX ID Trip       StartDateTime Duration
# 1:    1   1  1    1 2015-01-21 13:00:00       60
# 2:    1   1  1    1 2015-01-21 14:00:00       40
# 3:    2   2  1    1 2015-01-21 13:00:00       60
# 4:    2   2  1    1 2015-01-21 14:00:00       60
# 5:    2   2  1    1 2015-01-21 15:00:00       60
# 6:    2   2  1    1 2015-01-21 16:00:00        4
# 7:    3   3  1    1 2015-01-21 10:00:00       60
# 8:    3   3  1    1 2015-01-21 11:00:00       31
# 9:    4   4  2    1 2015-01-22 13:00:00       30
#10:    5   5  2    2 2015-01-30 23:00:00       60
#11:    5   5  2    2 2015-01-31 00:00:00       40

以下是非圆形小时的修改:

dt[5, StartDateTime := StartDateTime + 14*60]

library(lubridate)

dt[, {dur = diff(c(minute(StartDateTime),
                   tail(seq(0, Duration, 60), -1),
                   Duration + minute(StartDateTime)))
      list(StartDateTime = floor_date(StartDateTime, "hour") + (seq_along(dur)-1)*3600,
           Duration = dur)}
   , by = .(IDX, ID, Trip)]
#    IDX ID Trip       StartDateTime Duration
# 1:   1  1    1 2015-01-21 13:00:00       60
# 2:   1  1    1 2015-01-21 14:00:00       40
# 3:   2  1    1 2015-01-21 13:00:00       60
# 4:   2  1    1 2015-01-21 14:00:00       60
# 5:   2  1    1 2015-01-21 15:00:00       60
# 6:   2  1    1 2015-01-21 16:00:00        4
# 7:   3  1    1 2015-01-21 10:00:00       60
# 8:   3  1    1 2015-01-21 11:00:00       31
# 9:   4  2    1 2015-01-22 13:00:00       30
#10:   5  2    2 2015-01-30 23:00:00       46
#11:   5  2    2 2015-01-31 00:00:00       54