根据另一列计算一列的时差

时间:2019-03-14 12:39:06

标签: r dplyr

我有一些看起来像这样的数据:

> df
                  time region place action
1  2019-01-14 16:00:08      A     G  START
2  2019-01-14 16:00:08      A     I   STOP
3  2019-01-14 16:00:16      A     H  START
4  2019-01-14 16:00:16      A     G   STOP
5  2019-01-14 16:01:40      A     H   STOP
6  2019-01-14 16:01:40      A     G  START
7  2019-01-14 16:01:54      A     G   STOP
8  2019-01-14 16:02:21      A     D  START
9  2019-01-14 16:02:31      A     C  START
10 2019-01-14 16:02:54      A     D   STOP
11 2019-01-14 16:03:12      A     C   STOP
12 2019-01-14 16:03:13      A     E  START
13 2019-01-14 16:03:34      A     E   STOP
14 2019-01-14 16:03:34      A     A  START
15 2019-01-14 16:04:12      A     A   STOP
16 2019-01-14 16:04:12      A     E  START
17 2019-01-14 16:04:17      A     E   STOP
18 2019-01-14 16:04:55      A     F  START
19 2019-01-14 16:05:08      A     B  START
20 2019-01-14 16:05:08      A     F   STOP

我希望每个区域的位置之间的开始和停止之间有区别。 (例如,第1行的时间与第4行的时间之间的时差,因为它是该地点最近的STOP)。到目前为止,这就是我所要做的:在ifelse-construct中尝试做的是在动作列中找到带有STOP的下一行。

df %>% group_by(region, place) %>%
  mutate(difference = ifelse(action == "STOP", NA, time[which(action == "STOP")[which.max(which(action == "STOP") > row_number())]] - time))

# A tibble: 20 x 5
# Groups:   region, place [9]
   time                region place action difference
   <dttm>              <fct>  <fct> <chr>       <dbl>
 1 2019-01-14 16:00:08 A      g     START        8.52
 2 2019-01-14 16:00:08 A      i     STOP        NA   
 3 2019-01-14 16:00:16 A      h     START       84.2 
 4 2019-01-14 16:00:16 A      g     STOP        NA   
 5 2019-01-14 16:01:40 A      h     STOP        NA   
 6 2019-01-14 16:01:40 A      g     START      -84.2 
 7 2019-01-14 16:01:54 A      g     STOP        NA   
 8 2019-01-14 16:02:21 A      d     START       32.9 
 9 2019-01-14 16:02:31 A      c     START       40.8 
10 2019-01-14 16:02:54 A      d     STOP        NA   
11 2019-01-14 16:03:12 A      c     STOP        NA   
12 2019-01-14 16:03:13 A      e     START       21.3 
13 2019-01-14 16:03:34 A      e     STOP        NA   
14 2019-01-14 16:03:34 A      a     START       38.0 
15 2019-01-14 16:04:12 A      a     STOP        NA   
16 2019-01-14 16:04:12 A      e     START      -38.5 
17 2019-01-14 16:04:17 A      e     STOP        NA   
18 2019-01-14 16:04:55 A      f     START       13.4 
19 2019-01-14 16:05:08 A      b     START       NA   
20 2019-01-14 16:05:08 A      f     STOP        NA  

除了两个负数(它们分别为〜14和〜5)以外,时间差是正确的。有人知道为什么会这样吗?谢谢!

数据:

df <- structure(
  list(
    time = structure(
      c(
        1547478008.024,
        1547478008.225,
        1547478016.168,
        1547478016.542,
        1547478100.374,
        1547478100.758,
        1547478114.589,
        1547478141.86,
        1547478151.972,
        1547478174.757,
        1547478192.723,
        1547478193.077,
        1547478214.37,
        1547478214.562,
        1547478252.523,
        1547478252.907,
        1547478257.458,
        1547478295.109,
        1547478308.358,
        1547478308.547
      ),
      class = c("POSIXct", "POSIXt"),
      tzone = ""
    ),
    region = structure(
      c(
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L,
        1L
      ),
      .Label = "A",
      class = "factor"
    ),
    place = structure(
      c(
        7L,
        9L,
        8L,
        7L,
        8L,
        7L,
        7L,
        4L,
        3L,
        4L,
        3L,
        5L,
        5L,
        1L,
        1L,
        5L,
        5L,
        6L,
        2L,
        6L
      ),
      .Label = c("a",
                 "b", "c", "d", "e", "f", "g", "h", "i"),
      class = "factor"
    ),
    action = c(
      "START",
      "STOP",
      "START",
      "STOP",
      "STOP",
      "START",
      "STOP",
      "START",
      "START",
      "STOP",
      "STOP",
      "START",
      "STOP",
      "START",
      "STOP",
      "START",
      "STOP",
      "START",
      "START",
      "STOP"
    )
  ),
  row.names = c(NA, 20L),
  class = "data.frame"
)

1 个答案:

答案 0 :(得分:2)

假设每次启动之后和下一次启动之前总是存在一个STOP,那么它将起作用:

fun getAll(): Observable<List<User>> = Observable.fromArray(
    // from db
    userDAO
        .selectAll()
        .subscribeOn(ioScheduler),
    // from api
    userAPI
       .getAll()
       // --- this makes no sense by the way -------------------
       .doOnNext { lstUser -> Completable.concatArray(
            userDAO.deleteAll().subscribeOn(ioScheduler),
            userDAO.save(lstUser).subscribeOn(ioScheduler)
       )}
       // ------------------------------------------------------
       .subscribeOn(ioScheduler)
)
.concatMapEagerDelayError({ v -> v }, true)

注意:如果您100%确信该假设是正确的,则可以使用以下代码,它添加另一个df %>% group_by(region, place) %>% arrange(time) %>% mutate(difference=ifelse(action=="STOP", NA, difftime(lead(time),time,units="secs"))) # A tibble: 20 x 5 # Groups: region, place [9] time region place action difference <dttm> <fct> <fct> <chr> <dbl> 1 2019-01-14 10:00:08 A g START 8.52 2 2019-01-14 10:00:08 A i STOP NA 3 2019-01-14 10:00:16 A h START 84.2 4 2019-01-14 10:00:16 A g STOP NA 5 2019-01-14 10:01:40 A h STOP NA 6 2019-01-14 10:01:40 A g START 13.8 7 2019-01-14 10:01:54 A g STOP NA 8 2019-01-14 10:02:21 A d START 32.9 9 2019-01-14 10:02:31 A c START 40.8 10 2019-01-14 10:02:54 A d STOP NA 11 2019-01-14 10:03:12 A c STOP NA 12 2019-01-14 10:03:13 A e START 21.3 13 2019-01-14 10:03:34 A e STOP NA 14 2019-01-14 10:03:34 A a START 38.0 15 2019-01-14 10:04:12 A a STOP NA 16 2019-01-14 10:04:12 A e START 4.55 17 2019-01-14 10:04:17 A e STOP NA 18 2019-01-14 10:04:55 A f START 13.4 19 2019-01-14 10:05:08 A b START NA 20 2019-01-14 10:05:08 A f STOP NA 以确保START之后的下一行是STOP,否则给出NA:

ifelse