从满足日期条件开始计算行数

时间:2017-10-25 06:47:34

标签: r data.table time-series

我有一个带有以下列的R data.table

Date    Time    X   Y   
5/9/2016    12:00:00 AM 30  125 
5/9/2016    12:30:00 AM 27  127 
5/9/2016    1:00:00 AM  30  133 
5/9/2016    1:30:00 AM  25  115 
5/9/2016    2:00:00 AM  26  116 
5/9/2016    3:00:00 AM  34  128 
5/9/2016    3:30:00 AM  34  120 
5/9/2016    4:00:00 AM  26  133 
5/9/2016    5:00:00 AM  35  119 
5/9/2016    5:30:00 AM  35  126 
5/9/2016    6:00:00 AM  35  118 
5/9/2016    6:30:00 AM  28  120 
5/9/2016    7:00:00 AM  29  123 
5/9/2016    7:30:00 AM  30  125 
5/9/2016    8:30:00 AM  35  116 
5/9/2016    9:00:00 AM  32  123 
5/9/2016    9:30:00 AM  33  135 
5/9/2016    10:00:00 AM 32  127 
5/9/2016    10:30:00 AM 26  122 
5/9/2016    11:00:00 AM 28  122 
5/9/2016    11:30:00 AM 31  129 
5/9/2016    12:00:00 PM 29  127 
5/9/2016    12:30:00 PM 26  120 
5/9/2016    1:00:00 PM  32  126 
5/9/2016    1:30:00 PM  26  117 
5/9/2016    2:30:00 PM  32  133 
5/9/2016    3:00:00 PM  31  120 
5/9/2016    3:30:00 PM  35  129 
5/9/2016    4:00:00 PM  30  121 
5/9/2016    4:30:00 PM  35  132 
5/9/2016    5:00:00 PM  27  129 
5/9/2016    5:30:00 PM  27  122 
5/9/2016    6:00:00 PM  28  121 
5/9/2016    6:30:00 PM  35  127 
5/9/2016    7:00:00 PM  34  132 
5/9/2016    7:30:00 PM  25  133 
5/9/2016    8:00:00 PM  34  119 
5/9/2016    8:30:00 PM  33  125 
5/9/2016    9:00:00 PM  29  120 
5/9/2016    9:30:00 PM  34  125 
5/9/2016    10:00:00 PM 31  118 
5/9/2016    10:30:00 PM 33  118 
5/9/2016    11:00:00 PM 28  115 
5/9/2016    11:30:00 PM 34  126 
5/10/2016   12:00:00 AM 34  131 
5/10/2016   12:30:00 AM 25  130 
5/10/2016   1:00:00 AM  28  129 
5/10/2016   1:30:00 AM  26  132

我想要输出如下 -

Date    Time    X   Y   Rowcount@6hours
    5/9/2016    12:00:00 AM 30  125 0
    5/9/2016    12:30:00 AM 27  127 1
    5/9/2016    1:00:00 AM  30  133 2
    5/9/2016    1:30:00 AM  25  115 3
    5/9/2016    2:00:00 AM  26  116 4
    5/9/2016    3:00:00 AM  34  128 5
    5/9/2016    3:30:00 AM  34  120 6
    5/9/2016    4:00:00 AM  26  133 7
    5/9/2016    5:00:00 AM  35  119 8
    5/9/2016    5:30:00 AM  35  126 9
    5/9/2016    6:00:00 AM  35  118 10
    5/9/2016    6:30:00 AM  28  120 10
    5/9/2016    7:00:00 AM  29  123 10
    5/9/2016    7:30:00 AM  30  125 10
    5/9/2016    8:30:00 AM  35  116 9
    5/9/2016    9:00:00 AM  32  123 10
    5/9/2016    9:30:00 AM  33  135 10
    5/9/2016    10:00:00 AM 32  127 10
    5/9/2016    10:30:00 AM 26  122 10
    5/9/2016    11:00:00 AM 28  122 11
    5/9/2016    11:30:00 AM 31  129 11
    5/9/2016    12:00:00 PM 29  127 11
    5/9/2016    12:30:00 PM 26  120 11
    5/9/2016    1:00:00 PM  32  126 11
    5/9/2016    1:30:00 PM  26  117 11
    5/9/2016    2:30:00 PM  32  133 11
    5/9/2016    3:00:00 PM  31  120 11
    5/9/2016    3:30:00 PM  35  129 11
    5/9/2016    4:00:00 PM  30  121 11
    5/9/2016    4:30:00 PM  35  132 11
    5/9/2016    5:00:00 PM  27  129 11
    5/9/2016    5:30:00 PM  27  122 11
    5/9/2016    6:00:00 PM  28  121 11
    5/9/2016    6:30:00 PM  35  127 11
    5/9/2016    7:00:00 PM  34  132 11
    5/9/2016    7:30:00 PM  25  133 11
    5/9/2016    8:00:00 PM  34  119 11
    5/9/2016    8:30:00 PM  33  125 11
    5/9/2016    9:00:00 PM  29  120 12
    5/9/2016    9:30:00 PM  34  125 12
    5/9/2016    10:00:00 PM 31  118 12
    5/9/2016    10:30:00 PM 33  118 12
    5/9/2016    11:00:00 PM 28  115 12
    5/9/2016    11:30:00 PM 34  126 12
    5/10/2016   12:00:00 AM 34  131 12
    5/10/2016   12:30:00 AM 25  130 12
    5/10/2016   1:00:00 AM  28  129 12
    5/10/2016   1:30:00 AM  26  132 12

所以我的目标是计算我在过去n小时内获得的行数(例如n = 6)并将其附加到每一行(我想使用该值来确定width参数rollmean函数)。这里的数据粒度是30分钟,但它可以更改为秒级别(也可能是非均匀的!)并且可能缺少行

我尝试使用window.zooxts,但无法找到解决方案。我在另一个链接中发布了相同的问题,并且该响应在一天内有效,但不是在几天内Count number of rows between two timestamps in R dataframe

2 个答案:

答案 0 :(得分:4)

使用:

# convert to a 'data.table' (if necessary)
# add a datetime & rowid variable
DT <- as.data.table(df)[, `:=` (DateTime = as.POSIXct(paste(Date, Time), format = '%m/%d/%Y %I:%M:%S%p'),
                                rid = .I)][]

# calculate the count with a rolling join
DT[, count6hr := DT[.(DateTime = DateTime - 6*60*60, rid = rid)
                    , on = 'DateTime'
                    , roll = -Inf
                    , mult = 'last'
                    , i.rid - rid]][]

给出:

         Date       Time  X   Y            DateTime rid count6hr
 1:  5/9/2016 12:00:00AM 30 125 2016-05-09 00:00:00   1        0
 2:  5/9/2016 12:30:00AM 27 127 2016-05-09 00:30:00   2        1
 3:  5/9/2016  1:00:00AM 30 133 2016-05-09 01:00:00   3        2
 4:  5/9/2016  1:30:00AM 25 115 2016-05-09 01:30:00   4        3
 5:  5/9/2016  2:00:00AM 26 116 2016-05-09 02:00:00   5        4
 6:  5/9/2016  3:00:00AM 34 128 2016-05-09 03:00:00   6        5
 7:  5/9/2016  3:30:00AM 34 120 2016-05-09 03:30:00   7        6
 8:  5/9/2016  4:00:00AM 26 133 2016-05-09 04:00:00   8        7
 9:  5/9/2016  5:00:00AM 35 119 2016-05-09 05:00:00   9        8
10:  5/9/2016  5:30:00AM 35 126 2016-05-09 05:30:00  10        9
11:  5/9/2016  6:00:00AM 35 118 2016-05-09 06:00:00  11       10
12:  5/9/2016  6:30:00AM 28 120 2016-05-09 06:30:00  12       10
13:  5/9/2016  7:00:00AM 29 123 2016-05-09 07:00:00  13       10
14:  5/9/2016  7:30:00AM 30 125 2016-05-09 07:30:00  14       10
15:  5/9/2016  8:30:00AM 35 116 2016-05-09 08:30:00  15        9
16:  5/9/2016  9:00:00AM 32 123 2016-05-09 09:00:00  16       10
17:  5/9/2016  9:30:00AM 33 135 2016-05-09 09:30:00  17       10
18:  5/9/2016 10:00:00AM 32 127 2016-05-09 10:00:00  18       10
19:  5/9/2016 10:30:00AM 26 122 2016-05-09 10:30:00  19       10
20:  5/9/2016 11:00:00AM 28 122 2016-05-09 11:00:00  20       11
21:  5/9/2016 11:30:00AM 31 129 2016-05-09 11:30:00  21       11
22:  5/9/2016 12:00:00PM 29 127 2016-05-09 12:00:00  22       11
23:  5/9/2016 12:30:00PM 26 120 2016-05-09 12:30:00  23       11
24:  5/9/2016  1:00:00PM 32 126 2016-05-09 13:00:00  24       11
25:  5/9/2016  1:30:00PM 26 117 2016-05-09 13:30:00  25       11
26:  5/9/2016  2:30:00PM 32 133 2016-05-09 14:30:00  26       11
27:  5/9/2016  3:00:00PM 31 120 2016-05-09 15:00:00  27       11
28:  5/9/2016  3:30:00PM 35 129 2016-05-09 15:30:00  28       11
29:  5/9/2016  4:00:00PM 30 121 2016-05-09 16:00:00  29       11
30:  5/9/2016  4:30:00PM 35 132 2016-05-09 16:30:00  30       11
31:  5/9/2016  5:00:00PM 27 129 2016-05-09 17:00:00  31       11
32:  5/9/2016  5:30:00PM 27 122 2016-05-09 17:30:00  32       11
33:  5/9/2016  6:00:00PM 28 121 2016-05-09 18:00:00  33       11
34:  5/9/2016  6:30:00PM 35 127 2016-05-09 18:30:00  34       11
35:  5/9/2016  7:00:00PM 34 132 2016-05-09 19:00:00  35       11
36:  5/9/2016  7:30:00PM 25 133 2016-05-09 19:30:00  36       11
37:  5/9/2016  8:00:00PM 34 119 2016-05-09 20:00:00  37       11
38:  5/9/2016  8:30:00PM 33 125 2016-05-09 20:30:00  38       12
39:  5/9/2016  9:00:00PM 29 120 2016-05-09 21:00:00  39       12
40:  5/9/2016  9:30:00PM 34 125 2016-05-09 21:30:00  40       12
41:  5/9/2016 10:00:00PM 31 118 2016-05-09 22:00:00  41       12
42:  5/9/2016 10:30:00PM 33 118 2016-05-09 22:30:00  42       12
43:  5/9/2016 11:00:00PM 28 115 2016-05-09 23:00:00  43       12
44:  5/9/2016 11:30:00PM 34 126 2016-05-09 23:30:00  44       12
45: 5/10/2016 12:00:00AM 34 131 2016-05-10 00:00:00  45       12
46: 5/10/2016 12:30:00AM 25 130 2016-05-10 00:30:00  46       12
47: 5/10/2016  1:00:00AM 28 129 2016-05-10 01:00:00  47       12
48: 5/10/2016  1:30:00AM 26 132 2016-05-10 01:30:00  48       12

答案 1 :(得分:2)

这也可以使用 non-equi join 来解决:

library(data.table)
as.data.table(df)[
  , ts :=  lubridate::mdy_hms(paste(Date, Time))][
    , cnt := .SD[.(t6 = ts - 6*60*60, t0 = ts), 
                 on = .(ts >= t6, ts < t0), .N, by = .EACHI]$N][]
         Date     Time ampm  X   Y                  ts cnt
 1:  5/9/2016 12:00:00   AM 30 125 2016-05-09 00:00:00   0
 2:  5/9/2016 12:30:00   AM 27 127 2016-05-09 00:30:00   1
 3:  5/9/2016  1:00:00   AM 30 133 2016-05-09 01:00:00   2
 4:  5/9/2016  1:30:00   AM 25 115 2016-05-09 01:30:00   3
 5:  5/9/2016  2:00:00   AM 26 116 2016-05-09 02:00:00   4
 6:  5/9/2016  3:00:00   AM 34 128 2016-05-09 03:00:00   5
 7:  5/9/2016  3:30:00   AM 34 120 2016-05-09 03:30:00   6
 8:  5/9/2016  4:00:00   AM 26 133 2016-05-09 04:00:00   7
 9:  5/9/2016  5:00:00   AM 35 119 2016-05-09 05:00:00   8
10:  5/9/2016  5:30:00   AM 35 126 2016-05-09 05:30:00   9
11:  5/9/2016  6:00:00   AM 35 118 2016-05-09 06:00:00  10
12:  5/9/2016  6:30:00   AM 28 120 2016-05-09 06:30:00  10
13:  5/9/2016  7:00:00   AM 29 123 2016-05-09 07:00:00  10
14:  5/9/2016  7:30:00   AM 30 125 2016-05-09 07:30:00  10
15:  5/9/2016  8:30:00   AM 35 116 2016-05-09 08:30:00   9
16:  5/9/2016  9:00:00   AM 32 123 2016-05-09 09:00:00  10
17:  5/9/2016  9:30:00   AM 33 135 2016-05-09 09:30:00  10
18:  5/9/2016 10:00:00   AM 32 127 2016-05-09 10:00:00  10
19:  5/9/2016 10:30:00   AM 26 122 2016-05-09 10:30:00  10
20:  5/9/2016 11:00:00   AM 28 122 2016-05-09 11:00:00  11
21:  5/9/2016 11:30:00   AM 31 129 2016-05-09 11:30:00  11
22:  5/9/2016 12:00:00   PM 29 127 2016-05-09 12:00:00  11
23:  5/9/2016 12:30:00   PM 26 120 2016-05-09 12:30:00  11
24:  5/9/2016  1:00:00   PM 32 126 2016-05-09 13:00:00  11
25:  5/9/2016  1:30:00   PM 26 117 2016-05-09 13:30:00  11
26:  5/9/2016  2:30:00   PM 32 133 2016-05-09 14:30:00  11
27:  5/9/2016  3:00:00   PM 31 120 2016-05-09 15:00:00  11
28:  5/9/2016  3:30:00   PM 35 129 2016-05-09 15:30:00  11
29:  5/9/2016  4:00:00   PM 30 121 2016-05-09 16:00:00  11
30:  5/9/2016  4:30:00   PM 35 132 2016-05-09 16:30:00  11
31:  5/9/2016  5:00:00   PM 27 129 2016-05-09 17:00:00  11
32:  5/9/2016  5:30:00   PM 27 122 2016-05-09 17:30:00  11
33:  5/9/2016  6:00:00   PM 28 121 2016-05-09 18:00:00  11
34:  5/9/2016  6:30:00   PM 35 127 2016-05-09 18:30:00  11
35:  5/9/2016  7:00:00   PM 34 132 2016-05-09 19:00:00  11
36:  5/9/2016  7:30:00   PM 25 133 2016-05-09 19:30:00  11
37:  5/9/2016  8:00:00   PM 34 119 2016-05-09 20:00:00  11
38:  5/9/2016  8:30:00   PM 33 125 2016-05-09 20:30:00  12
39:  5/9/2016  9:00:00   PM 29 120 2016-05-09 21:00:00  12
40:  5/9/2016  9:30:00   PM 34 125 2016-05-09 21:30:00  12
41:  5/9/2016 10:00:00   PM 31 118 2016-05-09 22:00:00  12
42:  5/9/2016 10:30:00   PM 33 118 2016-05-09 22:30:00  12
43:  5/9/2016 11:00:00   PM 28 115 2016-05-09 23:00:00  12
44:  5/9/2016 11:30:00   PM 34 126 2016-05-09 23:30:00  12
45: 5/10/2016 12:00:00   AM 34 131 2016-05-10 00:00:00  12
46: 5/10/2016 12:30:00   AM 25 130 2016-05-10 00:30:00  12
47: 5/10/2016  1:00:00   AM 28 129 2016-05-10 01:00:00  12
48: 5/10/2016  1:30:00   AM 26 132 2016-05-10 01:30:00  12
         Date     Time ampm  X   Y                  ts cnt