我想用tidyverse计算一天中每小时的平均住院人数。有人可以帮忙吗?
这里是ID,入场(Adm)和光盘(出院)。
ID = c(101, 102,103, 104, 105, 106, 107)
Adm = as.POSIXct(c("2012-01-12 00:52:00", "2012-01-12 00:55:00", "2012-02-12
01:35:00", "2012-02-12 03:24:00", "2012-02-12 04:24:00",
"2012-02-12 05:24:00", "2012-02-12 05:28:00"))
Disc = as.POSIXct(c("2012-01-13 02:00:00", "2012-01-13 02:59:00", "2012-02-12
03:01:00", "2012-02-12 05:01:00", "2012-02-12 06:01:00",
"2012-02-12 08:01:00", "2012-02-12 08:01:00"))
df = data.frame(ID, Adm, Disc)
请有人帮忙!
答案 0 :(得分:1)
这是tidyverse
的方法:-
Adm
和Disc
之间的基本小时数是使用seq
-
Adm
2012-01-12 00:52:00
的{{1}} = Disc
&2012-01-12 02:00:00
= ID
之间的小时数将是101
,2012-01-12 00:00:00
和2012-01-12 01:00:00
。这些时间使用2012-01-12 02:00:00
串联到每一行的一列paste
中,然后使用hours_list
分成多行。
最终唯一的separate_rows
计数是通过对入场和出场时间之间的小时数进行分组来计算的。
ID
给出
library(tidyverse)
library(lubridate)
df %>%
mutate_at(vars(Adm:Disc), funs(ymd_h(strftime(., format = "%Y-%m-%d %H")))) %>% #date-hour is separated from timestamp and then converted into POSIXct format
rowwise() %>%
mutate(hours_list = paste(seq(Adm, Disc, by = "hour"), collapse = ",")) %>% #hours between Adm & Disc are calculated and concatenated by ','
separate_rows(hours_list, sep = ",") %>% #calculated hours are separated into multiple rows
mutate(hours_list = ymd_hms(hours_list)) %>% #calculated hours are converted into POSIXct format
group_by(hours_list) %>%
summarise(patient_count = n_distinct(ID)) #unique patient count is calculated by grouping on calculated hours_list
示例数据
hours_list patient_count
<dttm> <int>
1 2012-01-12 00:00:00 2
2 2012-01-12 01:00:00 2
3 2012-01-12 02:00:00 2
4 2012-02-12 01:00:00 1
5 2012-02-12 02:00:00 1
6 2012-02-12 03:00:00 2
7 2012-02-12 04:00:00 2
8 2012-02-12 05:00:00 4
9 2012-02-12 06:00:00 3
10 2012-02-12 07:00:00 2
11 2012-02-12 08:00:00 2