根据另一列中的公共ID查找R中的时间戳之间的差异

时间:2017-10-26 11:10:32

标签: r dplyr

给定数据集包含第三列中的时间戳,该时间戳包含以mm / dd / yyyy格式表示的日期,以及1月份的24小时格式的时间。我希望通过比较每一行与其前一行的比较来找出使用R的差异,只有当患者具有共同的值时,即如果两者的患者值相同,则每个时间戳行与前一行的差异说" 1" " 2"和" 3"。这也意味着数据集的第一行应该给出0分钟的值,因为没有什么可比较的。谢谢,请帮助。

patient  handling                time
1        Registration            1/2/2017 11:41
1        Triage and Assessment   1/2/2017 12:40
1        Registration            1/2/2017 12:40
1        Triage and Assessment   1/2/2017 22:32
1        Blood test              1/5/2017 8:59
1        Blood test              1/5/2017 14:34
1        MRI SCAN                1/5/2017 21:37
2        X-Ray                   1/7/2017  4:31
2        X-Ray                   1/7/2017  7:57
2        Discuss Results         1/7/2017 14:45
2        Discuss Results         1/7/2017 17:55
2        Check-out               1/9/2017 17:09
2        Check-out               1/9/2017 19:14
3        Registration            1/4/2017  1:34
3        Registration            1/4/2017  6:36
3        Triage and Assessment   1/4/2017 17:49
3        Triage and Assessment   1/5/2017 8:59
3        Blood test              1/5/2017 21:37
3        Blood test              1/6/2017 3:53

2 个答案:

答案 0 :(得分:4)

如果time已经是POSIXct类,并且数据框已按patienttime排序,则可以使用精简版附加分钟的时差版本SBista's answer

library(dplyr)
DF %>% 
  group_by(patient) %>% 
  mutate(delta = difftime(time, lag(time, default = first(time)), units = "mins")) 
 # A tibble: 19 x 4
 # Groups:   patient [3]
   patient              handling                time     delta
     <chr>                 <chr>              <dttm>    <time>
 1       1          Registration 2017-01-02 11:41:00    0 mins
 2       1 Triage and Assessment 2017-01-02 12:40:00   59 mins
 3       1          Registration 2017-01-02 12:40:00    0 mins
 4       1 Triage and Assessment 2017-01-02 22:32:00  592 mins
 5       1            Blood test 2017-01-05 08:59:00 3507 mins
 6       1            Blood test 2017-01-05 14:34:00  335 mins
 7       1              MRI SCAN 2017-01-05 21:37:00  423 mins
 8       2                 X-Ray 2017-01-07 04:31:00    0 mins
 9       2                 X-Ray 2017-01-07 07:57:00  206 mins
10       2       Discuss Results 2017-01-07 14:45:00  408 mins
11       2       Discuss Results 2017-01-07 17:55:00  190 mins
12       2             Check-out 2017-01-09 17:09:00 2834 mins
13       2             Check-out 2017-01-09 19:14:00  125 mins
14       3          Registration 2017-01-04 01:34:00    0 mins
15       3          Registration 2017-01-04 06:36:00  302 mins
16       3 Triage and Assessment 2017-01-04 17:49:00  673 mins
17       3 Triage and Assessment 2017-01-05 08:59:00  910 mins
18       3            Blood test 2017-01-05 21:37:00  758 mins
19       3            Blood test 2017-01-06 03:53:00  376 mins

另一种方法是为所有行计算delta,忽略patient的分组,然后根据OP的请求将每个patient的第一个值替换为零。首先忽略组可能会带来性能提升(未经验证)。

不幸的是,我不够精通使用dplyr语法实现这一点,因此我使用data.table及其更新参考

library(data.table)
setDT(DF)[, delta := difftime(time, shift(time), units = "mins")][]
DF[DF[, first(.I), by = patient]$V1, delta := 0][]
    patient              handling                time     delta
 1:       1          Registration 2017-01-02 11:41:00    0 mins
 2:       1 Triage and Assessment 2017-01-02 12:40:00   59 mins
 3:       1          Registration 2017-01-02 12:40:00    0 mins
 4:       1 Triage and Assessment 2017-01-02 22:32:00  592 mins
 5:       1            Blood test 2017-01-05 08:59:00 3507 mins
 6:       1            Blood test 2017-01-05 14:34:00  335 mins
 7:       1              MRI SCAN 2017-01-05 21:37:00  423 mins
 8:       2                 X-Ray 2017-01-07 04:31:00    0 mins
 9:       2                 X-Ray 2017-01-07 07:57:00  206 mins
10:       2       Discuss Results 2017-01-07 14:45:00  408 mins
11:       2       Discuss Results 2017-01-07 17:55:00  190 mins
12:       2             Check-out 2017-01-09 17:09:00 2834 mins
13:       2             Check-out 2017-01-09 19:14:00  125 mins
14:       3          Registration 2017-01-04 01:34:00    0 mins
15:       3          Registration 2017-01-04 06:36:00  302 mins
16:       3 Triage and Assessment 2017-01-04 17:49:00  673 mins
17:       3 Triage and Assessment 2017-01-05 08:59:00  910 mins
18:       3            Blood test 2017-01-05 21:37:00  758 mins
19:       3            Blood test 2017-01-06 03:53:00  376 mins

答案 1 :(得分:2)

您可以执行以下操作:

 data %>%
  group_by(patient) %>%
  mutate(diff_in_sec = as.POSIXct(time, format = "%m/%d/%Y %H:%M") - lag(as.POSIXct(time, format = "%m/%d/%Y %H:%M"), default=first(as.POSIXct(time, format = "%m/%d/%Y %H:%M"))))%>%
  mutate(diff_in_min = as.numeric(diff_in_sec/60))

输出为:

 # A tibble: 19 x 5
# Groups:   patient [3]
   patient              handling           time diff_in_sec diff_in_min
     <int>                 <chr>          <chr>      <time>       <dbl>
 1       1          Registration 1/2/2017 11:41      0 secs           0
 2       1 Triage and Assessment 1/2/2017 12:40   3540 secs          59
 3       1          Registration 1/2/2017 12:40      0 secs           0
 4       1 Triage and Assessment 1/2/2017 22:32  35520 secs         592
 5       1            Blood test  1/5/2017 8:59 210420 secs        3507
 6       1            Blood test 1/5/2017 14:34  20100 secs         335
 7       1              MRI SCAN 1/5/2017 21:37  25380 secs         423
 8       2                 X-Ray  1/7/2017 4:31      0 secs           0
 9       2                 X-Ray  1/7/2017 7:57  12360 secs         206
10       2       Discuss Results 1/7/2017 14:45  24480 secs         408
11       2       Discuss Results 1/7/2017 17:55  11400 secs         190
12       2             Check-out 1/9/2017 17:09 170040 secs        2834
13       2             Check-out 1/9/2017 19:14   7500 secs         125
14       3          Registration  1/4/2017 1:34      0 secs           0
15       3          Registration  1/4/2017 6:36  18120 secs         302
16       3 Triage and Assessment 1/4/2017 17:49  40380 secs         673
17       3 Triage and Assessment  1/5/2017 8:59  54600 secs         910
18       3            Blood test 1/5/2017 21:37  45480 secs         758
19       3            Blood test  1/6/2017 3:53  22560 secs         376