我有一个DF,带有身份证,入住和退房日期以及在医院的天数。我需要一个新的DF,在病人签入和签出之间的每一天,我都有一个id实例。
id dt_in dt_out stay
4317107984013 2017-11-12 2017-11-28 16 # First row
4317107984035 2017-11-22 2017-11-29 7 # Second row
4317107984046 2017-11-18 2017-11-29 11
4317107984057 2017-11-27 2017-11-29 2
4317107984079 2017-11-15 2017-11-29 14
4317107984090 2017-11-19 2017-11-29 10
4318100215913 2018-01-04 2018-01-04 0
4317108791611 2017-12-14 2017-12-14 0
4317107931059 2017-11-23 2017-11-23 0
4317108756092 2017-11-23 2017-12-27 34
对于上面的前2行,我需要类似的内容
id dt_in
4317107984013 2017-11-12 # First row
4317107984013 2017-11-13
4317107984013 2017-11-14
4317107984013 2017-11-15
4317107984013 2017-11-16
4317107984013 2017-11-17
4317107984013 2017-11-17
4317107984013 2017-11-19
4317107984013 2017-11-20
4317107984013 2017-11-21
4317107984013 2017-11-22
4317107984013 2017-11-23
4317107984013 2017-11-24
4317107984013 2017-11-25
4317107984013 2017-11-26
4317107984013 2017-11-27
4317107984013 2017-11-28
4317107984035 2017-11-22 # Second row
4317107984035 2017-11-23
4317107984035 2017-11-24
4317107984035 2017-11-25
4317107984035 2017-11-26
4317107984035 2017-11-27
4317107984035 2017-11-28
4317107984035 2017-11-29
...
我的系统: R版本3.5.1(2018-07-02) 平台:x86_64-apple-darwin15.6.0(64位) 运行于:macOS 10.14.4
# Here is my original DF:
df <- structure(list(id = c("4317107984013", "4317107984035", "4317107984046",
"4317107984057", "4317107984079", "4317107984090", "4318100215913",
"4317108791611", "4317107931059", "4317108756092"), dt_in = structure(c(17482,
17492, 17488, 17497, 17485, 17489, 17535, 17514, 17493, 17493
), class = "Date"), dt_out = structure(c(17498, 17499, 17499,
17499, 17499, 17499, 17535, 17514, 17493, 17527), class = "Date"),
stay = c(16L, 7L, 11L, 2L, 14L, 10L, 0L, 0L, 0L, 34L)), row.names = c(NA,
10L), class = "data.frame")
答案 0 :(得分:1)
library(tidyverse)
library(lubridate)
dat%>%
group_by(id)%>%
transmute(dt = list(seq(ymd(dt_in),ymd(dt_out),1)))%>%
unnest()
# A tibble: 104 x 2
# Groups: id [10]
id dt
<chr> <date>
1 4317107984013 2017-11-12
2 4317107984013 2017-11-13
3 4317107984013 2017-11-14
4 4317107984013 2017-11-15
5 4317107984013 2017-11-16
6 4317107984013 2017-11-17
7 4317107984013 2017-11-18
8 4317107984013 2017-11-19
9 4317107984013 2017-11-20
10 4317107984013 2017-11-21
# ... with 94 more rows
答案 1 :(得分:1)
使用dplyr
和tidyr
,您可以执行以下操作:
df %>%
group_by(id) %>%
complete(dt_in = seq.Date(dt_in, dt_out, "day")) %>%
select(id, dt_in)
id dt_in
<chr> <date>
1 4317107931059 2017-11-23
2 4317107984013 2017-11-12
3 4317107984013 2017-11-13
4 4317107984013 2017-11-14
5 4317107984013 2017-11-15
6 4317107984013 2017-11-16
7 4317107984013 2017-11-17
8 4317107984013 2017-11-18
9 4317107984013 2017-11-19
10 4317107984013 2017-11-20
# … with 94 more rows
答案 2 :(得分:1)
您可以使用uncount
中的tidyr
-
df %>%
uncount(stay, .id = "stay") %>%
mutate(
dt_in = as.Date(dt_in) + stay - 1
) %>%
select(-stay, -dt_out)
# showing results for only 1st id
id dt_in
1 4.317108e+12 2017-11-12
2 4.317108e+12 2017-11-13
3 4.317108e+12 2017-11-14
4 4.317108e+12 2017-11-15
5 4.317108e+12 2017-11-16
6 4.317108e+12 2017-11-17
7 4.317108e+12 2017-11-18
8 4.317108e+12 2017-11-19
9 4.317108e+12 2017-11-20
10 4.317108e+12 2017-11-21
11 4.317108e+12 2017-11-22
12 4.317108e+12 2017-11-23
13 4.317108e+12 2017-11-24
14 4.317108e+12 2017-11-25
15 4.317108e+12 2017-11-26
16 4.317108e+12 2017-11-27
答案 3 :(得分:0)
带有map2
library(tidyverse)
df %>%
transmute(id, dt_in = map2(dt_in, dt_out, seq, by = '1 day')) %>%
unnest
# A tibble: 104 x 2
# id dt_in
# <chr> <date>
# 1 4317107984013 2017-11-12
# 2 4317107984013 2017-11-13
# 3 4317107984013 2017-11-14
# 4 4317107984013 2017-11-15
# 5 4317107984013 2017-11-16
# 6 4317107984013 2017-11-17
# 7 4317107984013 2017-11-18
# 8 4317107984013 2017-11-19
# 9 4317107984013 2017-11-20
#10 4317107984013 2017-11-21
# … with 94 more rows
或带有base R
的选项
lst1 <- Map(seq, df$dt_in, df$dt_out, MoreArgs = list(by = "1 day"))
out <- data.frame(id = rep(df$id, lengths(lst1)), dt_in = do.call(c, lst1))