在R中工作,我有一个数据帧,其中包含三个变量(ID,日期时间和血压),其中每一行都是具有相关测量时间的人的血压测量值。每个人有多行。
我希望能够计算当前行/度量值的前60分钟的行数/度量值(每人)。
以下是一些示例数据
my_df<-data.frame(ID=c("A","A","A","A","A","A","B","B","B","C","C","C","C","C"),
Measured_DT_TM=as.POSIXct(c("2018-08-01 08:00:00","2018-08-01 08:20:00","2018-08-01 08:30:00","2018-08-01 08:35:00","2018-08-01 11:00:00","2018-08-01 11:30:00","2018-08-01 14:10:00","2018-08-01 15:40:00","2018-08-01 15:00:00","2018-08-01 13:00:00","2018-08-01 13:05:00","2018-08-01 13:30:00","2018-08-01 13:55:00","2018-08-01 14:40:00")),
blood_pressure=c(115,115,120,130,140,130,120,125,125,150,160,130,130,131))
首先,我已按时间将数据按人分组。我已经创建(变异)了一个新变量,该变量是从第一行/度量到当前行/度量(每人)的时间,并且是一个变量,它是从前一个度量到当前度量的时间。
library(dplyr)
my_df_1<-my_df %>%
group_by(ID) %>%
arrange(Measured_DT_TM, .by_group=TRUE) %>%
mutate(time_since_first_measure=difftime(Measured_DT_TM, first(Measured_DT_TM), units = c("mins")),
time_since_prev_measure=difftime(Measured_DT_TM, lag(Measured_DT_TM, n=1), units = c("mins")))
my_df_1
ID Measured_DT_TM bp time_since_first_measure time_since_prev_measure
<fct> <dttm> <dbl> <drtn> <drtn>
1 A 2018-08-01 08:00:00 115 0 mins NA mins
2 A 2018-08-01 08:20:00 115 20 mins 20 mins
3 A 2018-08-01 08:30:00 120 30 mins 10 mins
4 A 2018-08-01 08:35:00 130 35 mins 5 mins
5 A 2018-08-01 11:00:00 140 180 mins 145 mins
6 A 2018-08-01 11:30:00 130 210 mins 30 mins
7 B 2018-08-01 14:10:00 120 0 mins NA mins
8 B 2018-08-01 15:00:00 125 50 mins 50 mins
9 B 2018-08-01 15:40:00 125 90 mins 40 mins
10 C 2018-08-01 13:00:00 150 0 mins NA mins
11 C 2018-08-01 13:05:00 160 5 mins 5 mins
12 C 2018-08-01 13:30:00 130 30 mins 25 mins
13 C 2018-08-01 13:55:00 130 55 mins 25 mins
14 C 2018-08-01 14:40:00 131 100 mins 45 mins
我被困在这里,如何创建/更改一个新变量,该变量对当前行(每人)前60分钟的行数进行计数。我想尝试如图所示创建no_'measures_in_prev_60m'变量/列
ID Measured_DT_TM bp time_since_first_measure time_since_prev_measure measures_in_prev_60m
<fct> <dttm> <dbl> <drtn> <drtn> <dbl>
1 A 2018-08-01 08:00:00 115 0 mins NA mins NA
2 A 2018-08-01 08:20:00 115 20 mins 20 mins 1
3 A 2018-08-01 08:30:00 120 30 mins 10 mins 2
4 A 2018-08-01 08:35:00 130 35 mins 5 mins 3
5 A 2018-08-01 11:00:00 140 180 mins 145 mins 0
6 A 2018-08-01 11:30:00 130 210 mins 30 mins 1
7 B 2018-08-01 14:10:00 120 0 mins NA mins NA
8 B 2018-08-01 15:00:00 125 50 mins 50 mins 1
9 B 2018-08-01 15:40:00 125 90 mins 40 mins 1
10 C 2018-08-01 13:00:00 150 0 mins NA mins NA
11 C 2018-08-01 13:05:00 160 5 mins 5 mins 1
12 C 2018-08-01 13:30:00 130 30 mins 25 mins 2
13 C 2018-08-01 13:55:00 130 55 mins 25 mins 3
14 C 2018-08-01 14:40:00 131 100 mins 45 mins 1
任何人都可以提供建议/帮助吗? 谢谢
答案 0 :(得分:0)
使用列表列是tidyverse
和purrr
包的一部分,这是一个很好的例子。
我用mutate(y = list(x))
将每个ID的所有持续时间放入每一行,这将创建一个列表列。然后,我为每一行(截止)创建条件。然后,我使用pmap
测试每个持续时间是否合格(在前60分钟内),library(tidyverse, quietly = TRUE)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> Warning: package 'ggplot2' was built under R version 3.5.3
#> Warning: package 'tibble' was built under R version 3.5.3
#> Warning: package 'tidyr' was built under R version 3.5.3
#> Warning: package 'readr' was built under R version 3.5.2
#> Warning: package 'purrr' was built under R version 3.5.3
#> Warning: package 'dplyr' was built under R version 3.5.3
#> Warning: package 'stringr' was built under R version 3.5.2
#> Warning: package 'forcats' was built under R version 3.5.2
my_df<-data.frame(ID=c("A","A","A","A","A","A","B","B","B","C","C","C","C","C"),
Measured_DT_TM=as.POSIXct(c("2018-08-01 08:00:00","2018-08-01 08:20:00","2018-08-01 08:30:00","2018-08-01 08:35:00","2018-08-01 11:00:00","2018-08-01 11:30:00","2018-08-01 14:10:00","2018-08-01 15:40:00","2018-08-01 15:00:00","2018-08-01 13:00:00","2018-08-01 13:05:00","2018-08-01 13:30:00","2018-08-01 13:55:00","2018-08-01 14:40:00")),
blood_pressure=c(115,115,120,130,140,130,120,125,125,150,160,130,130,131)) %>%
group_by(ID) %>%
arrange(Measured_DT_TM, .by_group=TRUE) %>%
mutate(time_since_first_measure=difftime(Measured_DT_TM, first(Measured_DT_TM), units = c("mins")),
time_since_prev_measure=difftime(Measured_DT_TM, lag(Measured_DT_TM, n=1), units = c("mins")))
# steps broken out for readability
my_df %>%
mutate(all_measures_by_ID = list(time_since_first_measure),
cutoff_60 = time_since_first_measure - 60,
check_for_measures_within_prev_60m = pmap(list(all_measures_by_ID, time_since_first_measure, cutoff_60), ~(..1 < ..2 & ..1 >= ..3)),
no_measures_in_prev_60m = map(check_for_measures_within_prev_60m, sum)) %>%
View()
# results in one line and no extra columns
my_df %>%
mutate(no_measures_in_prev_60m = pmap(list(list(time_since_first_measure), time_since_first_measure, time_since_first_measure - 60),
~sum(..1 < ..2 & ..1 >= ..3))) %>%
unnest(no_measures_in_prev_60m) %>%
select(no_measures_in_prev_60m, everything())
#> # A tibble: 14 x 6
#> # Groups: ID [3]
#> no_measures_in_~ ID Measured_DT_TM blood_pressure
#> <int> <fct> <dttm> <dbl>
#> 1 0 A 2018-08-01 08:00:00 115
#> 2 1 A 2018-08-01 08:20:00 115
#> 3 2 A 2018-08-01 08:30:00 120
#> 4 3 A 2018-08-01 08:35:00 130
#> 5 0 A 2018-08-01 11:00:00 140
#> 6 1 A 2018-08-01 11:30:00 130
#> 7 0 B 2018-08-01 14:10:00 120
#> 8 1 B 2018-08-01 15:00:00 125
#> 9 1 B 2018-08-01 15:40:00 125
#> 10 0 C 2018-08-01 13:00:00 150
#> 11 1 C 2018-08-01 13:05:00 160
#> 12 2 C 2018-08-01 13:30:00 130
#> 13 3 C 2018-08-01 13:55:00 130
#> 14 1 C 2018-08-01 14:40:00 131
#> # ... with 2 more variables: time_since_first_measure <drtn>,
#> # time_since_prev_measure <drtn>
在每一行上操作并接受多个输入(即持续时间和截止时间的集合)。同时,对于每一行,我都会添加符合条件的元素。
vagrant@magma-dev:~/magma/lte/gateway$ make run
sudo service magma@* stop
make -C /lte/gateway/python buildenv
make[1]: *** /lte/gateway/python: No such file or directory. Stop.
Makefile:67: recipe for target 'build_python' failed
make: *** [build_python] Error 2
由reprex package(v0.3.0)
创建于2019-07-21