我想要类似下面的数据
data<- data.frame("Subject" = c("13434","14544", "14544",
"22222","22222","22222"),
"Period" = c("MAD", "MAD", "OSE", "MAD","OSE","OSE"),
"Dose" = c(400, 800, 800, 400, 800, 1200),
"Start" = as.Date(c('2017-04-18','2017-06-13'
,"2018-09-27", "2017-06-06","2018-08-21","2018-12-12")),
"End" = as.Date(c("2017-05-16","2017-07-11", "2019-02-09",
"2017-07-04", "2018-12-11","2019-02-05")))
data
Subject Period Dose Start End
13434 MAD 400 2017-04-18 2017-05-16
14544 MAD 800 2017-06-13 2017-07-11
14544 OSE 800 2018-09-27 2019-02-09
22222 MAD 400 2017-06-06 2017-07-04
22222 OSE 800 2018-08-21 2018-12-11
22222 OSE 1200 2018-12-12 2019-02-05
并将其转换为类似于以下内容的内容,该行中的每个日期都被赋予一行,并且剂量在该范围内按天累加。在理想世界中,当时间段发生变化时,累积剂量将从上一个时间段结束处继续。
Subject Period Sum_Dose Day
13434 MAD 400 2017-04-18
13434 MAD 800 2017-04-19
13434 MAD 1200 2017-04-20
13434 MAD 1600 2017-04-21
13434 MAD 2000 2017-04-22
13434 MAD 2400 2017-04-23
Etc.
在给定期间和剂量下,每个受试者的。
答案 0 :(得分:3)
这样吗?
library(tidyverse)
dat %>%
group_by(Subject, Period, Dose) %>%
summarize(Day = list(seq(Start, End, by = 'day'))) %>%
unnest(Day) %>%
mutate(Dose = cumsum(Dose)) %>%
ungroup()
输出:
# A tibble: 392 x 4
Subject Period Dose Day
<fct> <fct> <dbl> <date>
1 13434 MAD 400 2017-04-18
2 13434 MAD 800 2017-04-19
3 13434 MAD 1200 2017-04-20
4 13434 MAD 1600 2017-04-21
5 13434 MAD 2000 2017-04-22
6 13434 MAD 2400 2017-04-23
7 13434 MAD 2800 2017-04-24
8 13434 MAD 3200 2017-04-25
9 13434 MAD 3600 2017-04-26
10 13434 MAD 4000 2017-04-27
# ... with 382 more rows
我认为元组(Subject, Period, Dose)
是唯一的。如果没有,则可以按Start
End
添加分组。
“理想世界”可以通过这种方式进入:
dat %>%
group_by(Subject, Period, Dose) %>%
summarize(Day = list(seq(Start, End, by = 'day'))) %>%
unnest(Day) %>%
group_by(Subject) %>%
arrange(Day) %>%
mutate(Dose = cumsum(Dose)) %>%
ungroup()
如果我们在上面的代码中添加以下行:
... %>% filter(Day >= as.Date("2018-12-11"), Day <= as.Date("2018-12-12"),
Subject == "22222")
它将输出:
Subject Period Dose Day
<fct> <fct> <dbl> <date>
1 22222 OSE 102000 2018-12-11
2 22222 OSE 103200 2018-12-12
因此,似乎可以正确计算出cumsum
(相加之后的下一个周期的下一个剂量的1200)。
答案 1 :(得分:0)
感谢@utubun!我结束了,
private void cmdLoadScripts_CanExecute(object sender, CanExecuteRoutedEventArgs e)
{
e.CanExecute = true;
}
答案 2 :(得分:0)
如果我理解正确,则OP希望
Start
和End
日期之间的天序列,Dose
累积Subject
。此处不需要将“ 宽到长”重塑,例如gather()
或melt()
(并且指向错误的方向,恕我直言)。
dplyr
和tidyr
这是使用dplyr
和tidyr
的实现。由于seq()
不接受向量参数,因此我们需要按每一行分组,并unnest()
扩展日期。
library(dplyr)
library(tidyr)
dat %>%
group_by(rn = row_number()) %>%
mutate(Day = list(seq(Start, End, "1 day"))) %>%
unnest() %>%
arrange(Subject, Day) %>%
group_by(Subject)%>%
mutate(Sum_Dose = cumsum(Dose)) %>%
select(Subject, Period, Sum_Dose, Day)
请注意,在Day
尚未订购或日期范围重叠的情况下,在调用cumsum()
之前按dat
进行订购只是一个警告。
# A tibble: 392 x 5 # Groups: Subject [3] Subject Period Dose DAY Sum_Dose <fct> <fct> <dbl> <date> <dbl> 1 13434 MAD 400 2017-04-18 400 2 13434 MAD 400 2017-04-19 800 3 13434 MAD 400 2017-04-20 1200 4 13434 MAD 400 2017-04-21 1600 5 13434 MAD 400 2017-04-22 2000 6 13434 MAD 400 2017-04-23 2400 7 13434 MAD 400 2017-04-24 2800 8 13434 MAD 400 2017-04-25 3200 9 13434 MAD 400 2017-04-26 3600 10 13434 MAD 400 2017-04-27 4000 # ... with 382 more rows
data.table
data.table
版本实现了相同的方法,但是由于隐式完成了“嵌套”操作,因此较为冗长。
library(data.table)
setDT(dat)[, rn := .I][
, .(Subject, Period, Dose, Day = seq(Start, End, "1 day")), by = rn][
order(Day), .(Period, Sum_Dose = cumsum(Dose), Day), keyby = Subject]
Subject Period Sum_Dose Day 1: 13434 MAD 400 2017-04-18 2: 13434 MAD 800 2017-04-19 3: 13434 MAD 1200 2017-04-20 4: 13434 MAD 1600 2017-04-21 5: 13434 MAD 2000 2017-04-22 --- 388: 14544 OSE 128800 2019-02-05 389: 14544 OSE 129600 2019-02-06 390: 14544 OSE 130400 2019-02-07 391: 14544 OSE 131200 2019-02-08 392: 14544 OSE 132000 2019-02-09