给定数据集包含第三列中的时间戳,该时间戳包含以mm / dd / yyyy格式表示的日期,以及1月份的24小时格式的时间。我希望通过比较每一行与其前一行的比较来找出使用R的差异,只有当患者具有共同的值时,即如果两者的患者值相同,则每个时间戳行与前一行的差异说" 1" " 2"和" 3"。这也意味着数据集的第一行应该给出0分钟的值,因为没有什么可比较的。谢谢,请帮助。
patient handling time
1 Registration 1/2/2017 11:41
1 Triage and Assessment 1/2/2017 12:40
1 Registration 1/2/2017 12:40
1 Triage and Assessment 1/2/2017 22:32
1 Blood test 1/5/2017 8:59
1 Blood test 1/5/2017 14:34
1 MRI SCAN 1/5/2017 21:37
2 X-Ray 1/7/2017 4:31
2 X-Ray 1/7/2017 7:57
2 Discuss Results 1/7/2017 14:45
2 Discuss Results 1/7/2017 17:55
2 Check-out 1/9/2017 17:09
2 Check-out 1/9/2017 19:14
3 Registration 1/4/2017 1:34
3 Registration 1/4/2017 6:36
3 Triage and Assessment 1/4/2017 17:49
3 Triage and Assessment 1/5/2017 8:59
3 Blood test 1/5/2017 21:37
3 Blood test 1/6/2017 3:53
答案 0 :(得分:4)
如果time
已经是POSIXct
类,并且数据框已按patient
和time
排序,则可以使用精简版附加分钟的时差版本SBista's answer
library(dplyr)
DF %>%
group_by(patient) %>%
mutate(delta = difftime(time, lag(time, default = first(time)), units = "mins"))
# A tibble: 19 x 4 # Groups: patient [3] patient handling time delta <chr> <chr> <dttm> <time> 1 1 Registration 2017-01-02 11:41:00 0 mins 2 1 Triage and Assessment 2017-01-02 12:40:00 59 mins 3 1 Registration 2017-01-02 12:40:00 0 mins 4 1 Triage and Assessment 2017-01-02 22:32:00 592 mins 5 1 Blood test 2017-01-05 08:59:00 3507 mins 6 1 Blood test 2017-01-05 14:34:00 335 mins 7 1 MRI SCAN 2017-01-05 21:37:00 423 mins 8 2 X-Ray 2017-01-07 04:31:00 0 mins 9 2 X-Ray 2017-01-07 07:57:00 206 mins 10 2 Discuss Results 2017-01-07 14:45:00 408 mins 11 2 Discuss Results 2017-01-07 17:55:00 190 mins 12 2 Check-out 2017-01-09 17:09:00 2834 mins 13 2 Check-out 2017-01-09 19:14:00 125 mins 14 3 Registration 2017-01-04 01:34:00 0 mins 15 3 Registration 2017-01-04 06:36:00 302 mins 16 3 Triage and Assessment 2017-01-04 17:49:00 673 mins 17 3 Triage and Assessment 2017-01-05 08:59:00 910 mins 18 3 Blood test 2017-01-05 21:37:00 758 mins 19 3 Blood test 2017-01-06 03:53:00 376 mins
另一种方法是为所有行计算delta
,忽略patient
的分组,然后根据OP的请求将每个patient
的第一个值替换为零。首先忽略组可能会带来性能提升(未经验证)。
不幸的是,我不够精通使用dplyr
语法实现这一点,因此我使用data.table
及其更新参考:
library(data.table)
setDT(DF)[, delta := difftime(time, shift(time), units = "mins")][]
DF[DF[, first(.I), by = patient]$V1, delta := 0][]
patient handling time delta 1: 1 Registration 2017-01-02 11:41:00 0 mins 2: 1 Triage and Assessment 2017-01-02 12:40:00 59 mins 3: 1 Registration 2017-01-02 12:40:00 0 mins 4: 1 Triage and Assessment 2017-01-02 22:32:00 592 mins 5: 1 Blood test 2017-01-05 08:59:00 3507 mins 6: 1 Blood test 2017-01-05 14:34:00 335 mins 7: 1 MRI SCAN 2017-01-05 21:37:00 423 mins 8: 2 X-Ray 2017-01-07 04:31:00 0 mins 9: 2 X-Ray 2017-01-07 07:57:00 206 mins 10: 2 Discuss Results 2017-01-07 14:45:00 408 mins 11: 2 Discuss Results 2017-01-07 17:55:00 190 mins 12: 2 Check-out 2017-01-09 17:09:00 2834 mins 13: 2 Check-out 2017-01-09 19:14:00 125 mins 14: 3 Registration 2017-01-04 01:34:00 0 mins 15: 3 Registration 2017-01-04 06:36:00 302 mins 16: 3 Triage and Assessment 2017-01-04 17:49:00 673 mins 17: 3 Triage and Assessment 2017-01-05 08:59:00 910 mins 18: 3 Blood test 2017-01-05 21:37:00 758 mins 19: 3 Blood test 2017-01-06 03:53:00 376 mins
答案 1 :(得分:2)
您可以执行以下操作:
data %>%
group_by(patient) %>%
mutate(diff_in_sec = as.POSIXct(time, format = "%m/%d/%Y %H:%M") - lag(as.POSIXct(time, format = "%m/%d/%Y %H:%M"), default=first(as.POSIXct(time, format = "%m/%d/%Y %H:%M"))))%>%
mutate(diff_in_min = as.numeric(diff_in_sec/60))
输出为:
# A tibble: 19 x 5
# Groups: patient [3]
patient handling time diff_in_sec diff_in_min
<int> <chr> <chr> <time> <dbl>
1 1 Registration 1/2/2017 11:41 0 secs 0
2 1 Triage and Assessment 1/2/2017 12:40 3540 secs 59
3 1 Registration 1/2/2017 12:40 0 secs 0
4 1 Triage and Assessment 1/2/2017 22:32 35520 secs 592
5 1 Blood test 1/5/2017 8:59 210420 secs 3507
6 1 Blood test 1/5/2017 14:34 20100 secs 335
7 1 MRI SCAN 1/5/2017 21:37 25380 secs 423
8 2 X-Ray 1/7/2017 4:31 0 secs 0
9 2 X-Ray 1/7/2017 7:57 12360 secs 206
10 2 Discuss Results 1/7/2017 14:45 24480 secs 408
11 2 Discuss Results 1/7/2017 17:55 11400 secs 190
12 2 Check-out 1/9/2017 17:09 170040 secs 2834
13 2 Check-out 1/9/2017 19:14 7500 secs 125
14 3 Registration 1/4/2017 1:34 0 secs 0
15 3 Registration 1/4/2017 6:36 18120 secs 302
16 3 Triage and Assessment 1/4/2017 17:49 40380 secs 673
17 3 Triage and Assessment 1/5/2017 8:59 54600 secs 910
18 3 Blood test 1/5/2017 21:37 45480 secs 758
19 3 Blood test 1/6/2017 3:53 22560 secs 376