Question

我的数据格式如下：

   DeviceId               VIN Latitude Longitude            TrueTime
1 S5353532523              XY1  37.51874 -87.47483 2016-06-05 18:46:00
2 S5353532523              XY1  37.52975 -87.47588 2016-06-05 18:46:00
3 S5353532523              XY1  37.53472 -87.47734 2016-06-05 18:47:00
4 S5353532523              XY1  37.53769 -87.47846 2016-06-05 18:47:00
5 S5353532523              XY1  37.54271 -87.47963 2016-06-05 18:47:00
6 S5353532523              XY1  37.54780 -87.47942 2016-06-05 18:47:00
...

我想将这些数据分组为旅行。使用dplyr我从：

开始

 Data %>% group_by(VIN, DeviceID) %>% ?

但是我很好奇我应该在问号中加入什么。基本上我想添加一个列，在前一个时间增量大于5分钟后，从1开始分配tripID。

所以在某些时候，当TrueTime变化超过5分钟时，tripCounter会上升1.此外，它还需要增加VIN和DeviceID（因此计数器不应该在开始时重置为1）每个小组）。

Answer 1

我们可以使用difftime来区分每个组的相邻元素，将units指定为"mins"，创建逻辑索引cumsum来创建'TRIPID'

Data %>% 
      group_by(VIN, DeviceId) %>% 
      mutate(TripID = cumsum(c(TRUE, difftime(TrueTime[-1], 
                             TrueTime[-n()], units = "mins")>5)))

关于重置部分尚不清楚。如果这是基于各组之间超过5分钟的'TrueTime'，我们不需要group_by

Data %>%
       mutate(TripID = cumsum(c(TRUE, difftime(TrueTime[-1], 
                     TrueTime[-nrow(Data)], units = "mins")>5)))

或者如果需要在执行group_by

后添加

Data %>% 
     group_by(VIN, DeviceId) %>%
     mutate(TripID = cumsum(c(TRUE, difftime(TrueTime[-1], 
              TrueTime[-n()], units = "mins")>5))) %>%
     ungroup() %>% 
     mutate(TripID = group_indices_(., .dots = c("VIN", "DeviceId"))-1 + TripID)

注意：假设'TrueTime'类为POSIXct

数据

Data <- structure(list(DeviceId = c("S5353532523", "S5353532523", "S5353532523", 
"S5353532523", "S5353532523", "S5353532523", "S5353532523", "S5353532523", 
"S5353532523", "S5353532523", "S5353532523", "S5353532523"), 
VIN = c("XY1", "XY1", "XY1", "XY1", "XY1", "XY1", "XY2", 
"XY2", "XY2", "XY2", "XY2", "XY2"), Latitude = c(37.51874, 
37.52975, 37.53472, 37.53769, 37.54271, 37.5478, 37.51874, 
37.52975, 37.53472, 37.53769, 37.54271, 37.5478), Longitude = c(-87.47483, 
-87.47588, -87.47734, -87.47846, -87.47963, -87.47942, -87.47483, 
-87.47588, -87.47734, -87.47846, -87.47963, -87.47942), TrueTime = structure(c(1465132560, 
1465132560, 1465132620, 1465132620, 1465133040, 1465133040, 
1465132560, 1465132560, 1465133100, 1465133160, 1465133160, 
1465133160), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("DeviceId", 
"VIN", "Latitude", "Longitude", "TrueTime"), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")

通过R中的旅行分组数据

1 个答案:

数据