我有一个data.table dt,它看起来像:
>dt <- data.table(grp = c(1,1,1,1,1, 2,2,2,2,2,3,3,3,3,3),
date = c("2017-04-24", "2017-04-25", "2017-04-26", "2017-04-27","2017-04-28",
"2017-05-11", "2017-05-11", "2017-05-13", "2017-05-14","2017-05-15",
"2017-06-16", "2017-06-17", "2017-06-18", "2017-06-20", "2017-06-20"),
time = c(1817, 1902, 1704, 1404, 1152, 1344, 1455, 1235, 0844, 0744, 1439,1346, 1525,1211, 1333))
>dt
grp date time
1: 1 2017-04-24 1817
2: 1 2017-04-25 1902
3: 1 2017-04-26 1704
4: 1 2017-04-27 1404
5: 1 2017-04-28 1152
6: 2 2017-05-11 1344
7: 2 2017-05-11 1455
8: 2 2017-05-13 1235
9: 2 2017-05-14 844
10: 2 2017-05-15 744
11: 3 2017-06-16 1439
12: 3 2017-06-17 1346
13: 3 2017-06-18 1525
14: 3 2017-06-20 1211
15: 3 2017-06-20 1333
我想找到每个小组&#34; grp&#34;的最短和最长日期和时间。
为了找到我做的最短和最长日期:
dt[,"min_date" := min(date), by=c("grp")]
dt[,"max_date" := max(date), by=c("grp")]
我需要计算每组的最小和最大时间&#34; grp&#34;。最短时间是与最短日期的记录相关联的时间,最长时间是与具有最长日期的记录相关联的时间。
如果我的记录中有最大日期和最小日期重复,则必须在最长时间内获取与最终最大记录相关的时间,并且应在最短时间内与第一个最小记录相关的时间。
我的最终结果应如下:
> dt
grp date time min_date max_date min_time max_time
1: 1 2017-04-24 1817 2017-04-24 2017-04-28 1817 1152
2: 1 2017-04-25 1902 2017-04-24 2017-04-28 1817 1152
3: 1 2017-04-26 1704 2017-04-24 2017-04-28 1817 1152
4: 1 2017-04-27 1404 2017-04-24 2017-04-28 1817 1152
5: 1 2017-04-28 1152 2017-04-24 2017-04-28 1817 1152
6: 2 2017-05-11 1344 2017-05-11 2017-05-15 1344 744
7: 2 2017-05-11 1455 2017-05-11 2017-05-15 1344 744
8: 2 2017-05-13 1235 2017-05-11 2017-05-15 1344 744
9: 2 2017-05-14 844 2017-05-11 2017-05-15 1344 744
10: 2 2017-05-15 744 2017-05-11 2017-05-15 1344 744
11: 3 2017-06-16 1439 2017-06-16 2017-06-20 1439 1333
12: 3 2017-06-17 1346 2017-06-16 2017-06-20 1439 1333
13: 3 2017-06-18 1525 2017-06-16 2017-06-20 1439 1333
14: 3 2017-06-20 1211 2017-06-16 2017-06-20 1439 1333
15: 3 2017-06-20 1333 2017-06-16 2017-06-20 1439 1333
我如何在data.table
中的R中执行此操作答案 0 :(得分:1)
这应该有效:
dt[,"min_time" := min(time[which(min_date==date)]), by=grp]
dt[,"max_time" := max(time[which(max_date==date)]), by=grp]
dt
grp date time min_date max_date min_time max_time
1: 1 2017-04-24 1817 2017-04-24 2017-04-28 1817 1152
2: 1 2017-04-25 1902 2017-04-24 2017-04-28 1817 1152
3: 1 2017-04-26 1704 2017-04-24 2017-04-28 1817 1152
4: 1 2017-04-27 1404 2017-04-24 2017-04-28 1817 1152
5: 1 2017-04-28 1152 2017-04-24 2017-04-28 1817 1152
6: 2 2017-05-11 1344 2017-05-11 2017-05-15 1344 744
7: 2 2017-05-11 1455 2017-05-11 2017-05-15 1344 744
8: 2 2017-05-13 1235 2017-05-11 2017-05-15 1344 744
9: 2 2017-05-14 844 2017-05-11 2017-05-15 1344 744
10: 2 2017-05-15 744 2017-05-11 2017-05-15 1344 744
11: 3 2017-06-16 1439 2017-06-16 2017-06-20 1439 1333
12: 3 2017-06-17 1346 2017-06-16 2017-06-20 1439 1333
13: 3 2017-06-18 1525 2017-06-16 2017-06-20 1439 1333
14: 3 2017-06-20 1211 2017-06-16 2017-06-20 1439 1333
15: 3 2017-06-20 1333 2017-06-16 2017-06-20 1439 1333
或者在一行中:
dt[, `:=`(min_time = min(time[which(min_date==date)]), max_time = max(time[which(max_date==date)])), by=grp]