在R data.table group中获取最小和最大日期和时间

时间:2017-06-08 19:44:48

标签: r datetime data.table

我有一个data.table dt,它看起来像:

>dt <- data.table(grp = c(1,1,1,1,1, 2,2,2,2,2,3,3,3,3,3),
                 date = c("2017-04-24", "2017-04-25", "2017-04-26", "2017-04-27","2017-04-28", 
                          "2017-05-11", "2017-05-11", "2017-05-13", "2017-05-14","2017-05-15",
                          "2017-06-16", "2017-06-17", "2017-06-18", "2017-06-20", "2017-06-20"),
                 time = c(1817, 1902, 1704, 1404, 1152, 1344, 1455, 1235, 0844, 0744, 1439,1346, 1525,1211, 1333))
>dt
    grp       date time
 1:   1 2017-04-24 1817
 2:   1 2017-04-25 1902
 3:   1 2017-04-26 1704
 4:   1 2017-04-27 1404
 5:   1 2017-04-28 1152
 6:   2 2017-05-11 1344
 7:   2 2017-05-11 1455
 8:   2 2017-05-13 1235
 9:   2 2017-05-14  844
10:   2 2017-05-15  744
11:   3 2017-06-16 1439
12:   3 2017-06-17 1346
13:   3 2017-06-18 1525
14:   3 2017-06-20 1211
15:   3 2017-06-20 1333

我想找到每个小组&#34; grp&#34;的最短和最长日期和时间。

为了找到我做的最短和最长日期:

dt[,"min_date" := min(date), by=c("grp")]
dt[,"max_date" := max(date), by=c("grp")] 

我需要计算每组的最小和最大时间&#34; grp&#34;。最短时间是与最短日期的记录相关联的时间,最长时间是与具有最长日期的记录相关联的时间。

如果我的记录中有最大日期和最小日期重复,则必须在最长时间内获取与最终最大记录相关的时间,并且应在最短时间内与第一个最小记录相关的时间。

我的最终结果应如下:

> dt
    grp       date time   min_date   max_date min_time max_time
 1:   1 2017-04-24 1817 2017-04-24 2017-04-28     1817     1152
 2:   1 2017-04-25 1902 2017-04-24 2017-04-28     1817     1152
 3:   1 2017-04-26 1704 2017-04-24 2017-04-28     1817     1152
 4:   1 2017-04-27 1404 2017-04-24 2017-04-28     1817     1152
 5:   1 2017-04-28 1152 2017-04-24 2017-04-28     1817     1152
 6:   2 2017-05-11 1344 2017-05-11 2017-05-15     1344      744
 7:   2 2017-05-11 1455 2017-05-11 2017-05-15     1344      744
 8:   2 2017-05-13 1235 2017-05-11 2017-05-15     1344      744
 9:   2 2017-05-14  844 2017-05-11 2017-05-15     1344      744
10:   2 2017-05-15  744 2017-05-11 2017-05-15     1344      744
11:   3 2017-06-16 1439 2017-06-16 2017-06-20     1439     1333
12:   3 2017-06-17 1346 2017-06-16 2017-06-20     1439     1333
13:   3 2017-06-18 1525 2017-06-16 2017-06-20     1439     1333
14:   3 2017-06-20 1211 2017-06-16 2017-06-20     1439     1333
15:   3 2017-06-20 1333 2017-06-16 2017-06-20     1439     1333

我如何在data.table

中的R中执行此操作

1 个答案:

答案 0 :(得分:1)

这应该有效:

dt[,"min_time" := min(time[which(min_date==date)]), by=grp]
dt[,"max_time" := max(time[which(max_date==date)]), by=grp]
dt
        grp       date time   min_date   max_date min_time max_time
 1:   1 2017-04-24 1817 2017-04-24 2017-04-28     1817     1152
 2:   1 2017-04-25 1902 2017-04-24 2017-04-28     1817     1152
 3:   1 2017-04-26 1704 2017-04-24 2017-04-28     1817     1152
 4:   1 2017-04-27 1404 2017-04-24 2017-04-28     1817     1152
 5:   1 2017-04-28 1152 2017-04-24 2017-04-28     1817     1152
 6:   2 2017-05-11 1344 2017-05-11 2017-05-15     1344      744
 7:   2 2017-05-11 1455 2017-05-11 2017-05-15     1344      744
 8:   2 2017-05-13 1235 2017-05-11 2017-05-15     1344      744
 9:   2 2017-05-14  844 2017-05-11 2017-05-15     1344      744
10:   2 2017-05-15  744 2017-05-11 2017-05-15     1344      744
11:   3 2017-06-16 1439 2017-06-16 2017-06-20     1439     1333
12:   3 2017-06-17 1346 2017-06-16 2017-06-20     1439     1333
13:   3 2017-06-18 1525 2017-06-16 2017-06-20     1439     1333
14:   3 2017-06-20 1211 2017-06-16 2017-06-20     1439     1333
15:   3 2017-06-20 1333 2017-06-16 2017-06-20     1439     1333

或者在一行中:

dt[, `:=`(min_time = min(time[which(min_date==date)]), max_time = max(time[which(max_date==date)])), by=grp]