如何根据时间间隔条件对行进行分组和计数

时间:2019-07-18 22:54:15

标签: datetime group-by count dplyr mutate

在R中工作,我有一个数据帧,其中包含三个变量(ID,日期时间和血压),其中每一行都是具有相关测量时间的人的血压测量值。每个人有多行。

我希望能够计算当前行/度量值的前60分钟的行数/度量值(每人)。

以下是一些示例数据

my_df<-data.frame(ID=c("A","A","A","A","A","A","B","B","B","C","C","C","C","C"),
 Measured_DT_TM=as.POSIXct(c("2018-08-01 08:00:00","2018-08-01 08:20:00","2018-08-01 08:30:00","2018-08-01 08:35:00","2018-08-01 11:00:00","2018-08-01 11:30:00","2018-08-01 14:10:00","2018-08-01 15:40:00","2018-08-01 15:00:00","2018-08-01 13:00:00","2018-08-01 13:05:00","2018-08-01 13:30:00","2018-08-01 13:55:00","2018-08-01 14:40:00")),
 blood_pressure=c(115,115,120,130,140,130,120,125,125,150,160,130,130,131))

首先,我已按时间将数据按人分组。我已经创建(变异)了一个新变量,该变量是从第一行/度量到当前行/度量(每人)的时间,并且是一个变量,它是从前一个度量到当前度量的时间。

library(dplyr)
my_df_1<-my_df %>% 
  group_by(ID) %>% 
  arrange(Measured_DT_TM, .by_group=TRUE) %>% 
  mutate(time_since_first_measure=difftime(Measured_DT_TM, first(Measured_DT_TM), units = c("mins")),
         time_since_prev_measure=difftime(Measured_DT_TM, lag(Measured_DT_TM, n=1), units = c("mins")))

my_df_1
   ID    Measured_DT_TM         bp time_since_first_measure time_since_prev_measure
   <fct> <dttm>              <dbl> <drtn>                   <drtn>                 
 1 A     2018-08-01 08:00:00   115   0 mins                  NA mins               
 2 A     2018-08-01 08:20:00   115  20 mins                  20 mins               
 3 A     2018-08-01 08:30:00   120  30 mins                  10 mins               
 4 A     2018-08-01 08:35:00   130  35 mins                   5 mins               
 5 A     2018-08-01 11:00:00   140 180 mins                 145 mins               
 6 A     2018-08-01 11:30:00   130 210 mins                  30 mins               
 7 B     2018-08-01 14:10:00   120   0 mins                  NA mins               
 8 B     2018-08-01 15:00:00   125  50 mins                  50 mins               
 9 B     2018-08-01 15:40:00   125  90 mins                  40 mins               
10 C     2018-08-01 13:00:00   150   0 mins                  NA mins               
11 C     2018-08-01 13:05:00   160   5 mins                   5 mins               
12 C     2018-08-01 13:30:00   130  30 mins                  25 mins               
13 C     2018-08-01 13:55:00   130  55 mins                  25 mins               
14 C     2018-08-01 14:40:00   131 100 mins                  45 mins               

我被困在这里,如何创建/更改一个新变量,该变量对当前行(每人)前60分钟的行数进行计数。我想尝试如图所示创建no_'measures_in_prev_60m'变量/列

   ID    Measured_DT_TM         bp time_since_first_measure time_since_prev_measure measures_in_prev_60m
   <fct> <dttm>              <dbl> <drtn>                   <drtn>                                 <dbl>
 1 A     2018-08-01 08:00:00   115   0 mins                  NA mins                                  NA
 2 A     2018-08-01 08:20:00   115  20 mins                  20 mins                                   1
 3 A     2018-08-01 08:30:00   120  30 mins                  10 mins                                   2
 4 A     2018-08-01 08:35:00   130  35 mins                   5 mins                                   3
 5 A     2018-08-01 11:00:00   140 180 mins                 145 mins                                   0
 6 A     2018-08-01 11:30:00   130 210 mins                  30 mins                                   1
 7 B     2018-08-01 14:10:00   120   0 mins                  NA mins                                  NA
 8 B     2018-08-01 15:00:00   125  50 mins                  50 mins                                   1
 9 B     2018-08-01 15:40:00   125  90 mins                  40 mins                                   1
10 C     2018-08-01 13:00:00   150   0 mins                  NA mins                                  NA
11 C     2018-08-01 13:05:00   160   5 mins                   5 mins                                   1
12 C     2018-08-01 13:30:00   130  30 mins                  25 mins                                   2
13 C     2018-08-01 13:55:00   130  55 mins                  25 mins                                   3
14 C     2018-08-01 14:40:00   131 100 mins                  45 mins                                   1

任何人都可以提供建议/帮助吗? 谢谢

1 个答案:

答案 0 :(得分:0)

使用列表列是tidyversepurrr包的一部分,这是一个很好的例子。

我用mutate(y = list(x))将每个ID的所有持续时间放入每一行,这将创建一个列表列。然后,我为每一行(截止)创建条件。然后,我使用pmap测试每个持续时间是否合格(在前60分钟内),library(tidyverse, quietly = TRUE) #> Warning: package 'tidyverse' was built under R version 3.5.3 #> Warning: package 'ggplot2' was built under R version 3.5.3 #> Warning: package 'tibble' was built under R version 3.5.3 #> Warning: package 'tidyr' was built under R version 3.5.3 #> Warning: package 'readr' was built under R version 3.5.2 #> Warning: package 'purrr' was built under R version 3.5.3 #> Warning: package 'dplyr' was built under R version 3.5.3 #> Warning: package 'stringr' was built under R version 3.5.2 #> Warning: package 'forcats' was built under R version 3.5.2 my_df<-data.frame(ID=c("A","A","A","A","A","A","B","B","B","C","C","C","C","C"), Measured_DT_TM=as.POSIXct(c("2018-08-01 08:00:00","2018-08-01 08:20:00","2018-08-01 08:30:00","2018-08-01 08:35:00","2018-08-01 11:00:00","2018-08-01 11:30:00","2018-08-01 14:10:00","2018-08-01 15:40:00","2018-08-01 15:00:00","2018-08-01 13:00:00","2018-08-01 13:05:00","2018-08-01 13:30:00","2018-08-01 13:55:00","2018-08-01 14:40:00")), blood_pressure=c(115,115,120,130,140,130,120,125,125,150,160,130,130,131)) %>% group_by(ID) %>% arrange(Measured_DT_TM, .by_group=TRUE) %>% mutate(time_since_first_measure=difftime(Measured_DT_TM, first(Measured_DT_TM), units = c("mins")), time_since_prev_measure=difftime(Measured_DT_TM, lag(Measured_DT_TM, n=1), units = c("mins"))) # steps broken out for readability my_df %>% mutate(all_measures_by_ID = list(time_since_first_measure), cutoff_60 = time_since_first_measure - 60, check_for_measures_within_prev_60m = pmap(list(all_measures_by_ID, time_since_first_measure, cutoff_60), ~(..1 < ..2 & ..1 >= ..3)), no_measures_in_prev_60m = map(check_for_measures_within_prev_60m, sum)) %>% View() # results in one line and no extra columns my_df %>% mutate(no_measures_in_prev_60m = pmap(list(list(time_since_first_measure), time_since_first_measure, time_since_first_measure - 60), ~sum(..1 < ..2 & ..1 >= ..3))) %>% unnest(no_measures_in_prev_60m) %>% select(no_measures_in_prev_60m, everything()) #> # A tibble: 14 x 6 #> # Groups: ID [3] #> no_measures_in_~ ID Measured_DT_TM blood_pressure #> <int> <fct> <dttm> <dbl> #> 1 0 A 2018-08-01 08:00:00 115 #> 2 1 A 2018-08-01 08:20:00 115 #> 3 2 A 2018-08-01 08:30:00 120 #> 4 3 A 2018-08-01 08:35:00 130 #> 5 0 A 2018-08-01 11:00:00 140 #> 6 1 A 2018-08-01 11:30:00 130 #> 7 0 B 2018-08-01 14:10:00 120 #> 8 1 B 2018-08-01 15:00:00 125 #> 9 1 B 2018-08-01 15:40:00 125 #> 10 0 C 2018-08-01 13:00:00 150 #> 11 1 C 2018-08-01 13:05:00 160 #> 12 2 C 2018-08-01 13:30:00 130 #> 13 3 C 2018-08-01 13:55:00 130 #> 14 1 C 2018-08-01 14:40:00 131 #> # ... with 2 more variables: time_since_first_measure <drtn>, #> # time_since_prev_measure <drtn> 在每一行上操作并接受多个输入(即持续时间和截止时间的集合)。同时,对于每一行,我都会添加符合条件的元素。

vagrant@magma-dev:~/magma/lte/gateway$ make run
sudo service magma@* stop
make -C /lte/gateway/python buildenv
make[1]: *** /lte/gateway/python: No such file or directory.  Stop.
Makefile:67: recipe for target 'build_python' failed
make: *** [build_python] Error 2

reprex package(v0.3.0)

创建于2019-07-21