将日期范围更改为一系列日期(从宽到长)

时间:2019-03-01 18:11:11

标签: r date

我想要类似下面的数据

data<- data.frame("Subject" = c("13434","14544", "14544", 
                             "22222","22222","22222"), 
                  "Period" = c("MAD", "MAD", "OSE", "MAD","OSE","OSE"), 
                  "Dose" = c(400, 800, 800, 400, 800, 1200), 
                  "Start" = as.Date(c('2017-04-18','2017-06-13'
                        ,"2018-09-27", "2017-06-06","2018-08-21","2018-12-12")), 
                  "End" = as.Date(c("2017-05-16","2017-07-11", "2019-02-09",
                      "2017-07-04", "2018-12-11","2019-02-05")))

 data
Subject Period Dose  Start   End 
 13434  MAD  400    2017-04-18  2017-05-16
 14544  MAD  800    2017-06-13  2017-07-11
 14544  OSE  800    2018-09-27  2019-02-09
 22222  MAD  400    2017-06-06  2017-07-04
 22222  OSE  800    2018-08-21  2018-12-11
 22222  OSE  1200   2018-12-12  2019-02-05

并将其转换为类似于以下内容的内容,该行中的每个日期都被赋予一行,并且剂量在该范围内按天累加。在理想世界中,当时间段发生变化时,累积剂量将从上一个时间段结束处继续。

Subject Period Sum_Dose   Day
 13434  MAD    400   2017-04-18
 13434  MAD    800   2017-04-19
 13434  MAD   1200   2017-04-20
 13434  MAD   1600   2017-04-21
 13434  MAD   2000   2017-04-22
 13434  MAD   2400   2017-04-23
 Etc. 
在给定期间和剂量下,每个受试者的

3 个答案:

答案 0 :(得分:3)

这样吗?

library(tidyverse)

dat %>%
  group_by(Subject, Period, Dose) %>%
  summarize(Day = list(seq(Start, End, by = 'day'))) %>% 
  unnest(Day) %>%
  mutate(Dose = cumsum(Dose)) %>%
  ungroup()

输出:

# A tibble: 392 x 4
   Subject Period  Dose Day       
   <fct>   <fct>  <dbl> <date>    
 1 13434   MAD      400 2017-04-18
 2 13434   MAD      800 2017-04-19
 3 13434   MAD     1200 2017-04-20
 4 13434   MAD     1600 2017-04-21
 5 13434   MAD     2000 2017-04-22
 6 13434   MAD     2400 2017-04-23
 7 13434   MAD     2800 2017-04-24
 8 13434   MAD     3200 2017-04-25
 9 13434   MAD     3600 2017-04-26
10 13434   MAD     4000 2017-04-27
# ... with 382 more rows

我认为元组(Subject, Period, Dose)是唯一的。如果没有,则可以按Start End添加分组。

“理想世界”可以通过这种方式进入:

dat %>%
  group_by(Subject, Period, Dose) %>%
  summarize(Day = list(seq(Start, End, by = 'day'))) %>% 
  unnest(Day) %>%
  group_by(Subject) %>%
  arrange(Day) %>%
  mutate(Dose = cumsum(Dose)) %>%
  ungroup() 

如果我们在上面的代码中添加以下行:

... %>% filter(Day >= as.Date("2018-12-11"), Day <= as.Date("2018-12-12"), 
               Subject == "22222")

它将输出:

  Subject Period   Dose Day       
  <fct>   <fct>   <dbl> <date>    
1 22222   OSE    102000 2018-12-11
2 22222   OSE    103200 2018-12-12

因此,似乎可以正确计算出cumsum(相加之后的下一个周期的下一个剂量的1200)。

答案 1 :(得分:0)

感谢@utubun!我结束了,

    private void cmdLoadScripts_CanExecute(object sender, CanExecuteRoutedEventArgs e)
    {
        e.CanExecute = true;
    }

答案 2 :(得分:0)

如果我理解正确,则OP希望

  1. 将每一行扩展为给定的StartEnd日期之间的天序列,
  2. 整天为每个Dose累积Subject

此处不需要将“ 宽到长”重塑,例如gather()melt()(并且指向错误的方向,恕我直言)。

dplyrtidyr

这是使用dplyrtidyr的实现。由于seq()不接受向量参数,因此我们需要按每一行分组,并unnest()扩展日期。

library(dplyr)
library(tidyr)
dat %>% 
  group_by(rn = row_number()) %>%
  mutate(Day = list(seq(Start, End, "1 day"))) %>% 
  unnest() %>% 
  arrange(Subject, Day) %>% 
  group_by(Subject)%>%
  mutate(Sum_Dose = cumsum(Dose)) %>% 
  select(Subject, Period, Sum_Dose, Day)

请注意,在Day尚未订购或日期范围重叠的情况下,在调用cumsum()之前按dat进行订购只是一个警告。

# A tibble: 392 x 5
# Groups:   Subject [3]
   Subject Period  Dose DAY        Sum_Dose
   <fct>   <fct>  <dbl> <date>        <dbl>
 1 13434   MAD      400 2017-04-18      400
 2 13434   MAD      400 2017-04-19      800
 3 13434   MAD      400 2017-04-20     1200
 4 13434   MAD      400 2017-04-21     1600
 5 13434   MAD      400 2017-04-22     2000
 6 13434   MAD      400 2017-04-23     2400
 7 13434   MAD      400 2017-04-24     2800
 8 13434   MAD      400 2017-04-25     3200
 9 13434   MAD      400 2017-04-26     3600
10 13434   MAD      400 2017-04-27     4000
# ... with 382 more rows

data.table

data.table版本实现了相同的方法,但是由于隐式完成了“嵌套”操作,因此较为冗长。

library(data.table)
setDT(dat)[, rn := .I][
  , .(Subject, Period, Dose, Day = seq(Start, End, "1 day")), by = rn][
    order(Day), .(Period, Sum_Dose = cumsum(Dose), Day), keyby = Subject]
     Subject Period Sum_Dose        Day
  1:   13434    MAD      400 2017-04-18
  2:   13434    MAD      800 2017-04-19
  3:   13434    MAD     1200 2017-04-20
  4:   13434    MAD     1600 2017-04-21
  5:   13434    MAD     2000 2017-04-22
 ---                                   
388:   14544    OSE   128800 2019-02-05
389:   14544    OSE   129600 2019-02-06
390:   14544    OSE   130400 2019-02-07
391:   14544    OSE   131200 2019-02-08
392:   14544    OSE   132000 2019-02-09