我正在尝试创建一个数据集的玩具示例,其中人员嵌套在组中,其中每个人有4或6个baseline
观察值和最多20 postBaseline
个观察值。这是起点:
person group baseline baselineStart postBaseline
1 1 1 4 2018-06-04 20
2 2 1 6 2018-06-04 20
3 3 1 4 2018-06-04 20
4 4 1 6 2018-06-04 20
5 5 2 4 2018-07-02 20
6 6 2 6 2018-07-02 20
7 7 2 4 2018-07-02 20
8 8 2 6 2018-07-02 20
9 9 3 4 2018-07-30 20
10 10 3 6 2018-07-30 20
11 11 3 4 2018-07-30 20
12 12 3 6 2018-07-30 20
13 13 4 4 2018-09-03 20
14 14 4 6 2018-09-03 20
15 15 4 4 2018-09-03 20
16 16 4 6 2018-09-03 20
17 17 5 4 2018-10-01 18
18 18 5 6 2018-10-01 16
19 19 5 4 2018-10-01 18
20 20 5 6 2018-10-01 16
我想要一个类似于人1的迷你示例:
所以我正在寻找一种方法来为person
1开始2018-06-04
创建4个基线行,然后是20 postBaseline
行。观察日期应增加1周。
以下是初始数据框:
mock <- data.frame(person = seq(from=1, to=20),
group = rep(1:5, times=1, each=4),
baseline = rep(c(4, 6), times=2, each=1),
baselineStart = ymd(rep(c("2018-06-04",
"2018-07-02",
"2018-07-30",
"2018-09-03",
"2018-10-01"),
times=1,
each=4)
),
postBaseline = c(rep(20, 4),
rep(20, 4),
rep(20, 4),
rep(20, 4),
rep(c(18, 16), 2)
)
)
答案 0 :(得分:1)
以下是使用tidyverse
的解决方案。 complete
用于扩展数据框以包含obs
每person
,然后fill
填写缺失值。我们只需在日期中添加obs - 1
倍数7即可生成正确的周数,并将obs
与baseline
进行比较,以查看行是否为postBaseline
。最后,我们可以过滤掉obs
大于研究范围的行(由baseline + postBaseline
给出)。我显示了人1和2之间边界的打印输出,表明它在预期时结束并且pb_dummy
正确更改。
mock <- data.frame(person = seq(from=1, to=20),
group = rep(1:5, times=1, each=4),
baseline = rep(c(4, 6), times=2, each=1),
baselineStart = lubridate::ymd(rep(c("2018-06-04",
"2018-07-02",
"2018-07-30",
"2018-09-03",
"2018-10-01"),
times=1,
each=4)
),
postBaseline = c(rep(20, 4),
rep(20, 4),
rep(20, 4),
rep(20, 4),
rep(c(18, 16), 2)
)
)
library(tidyverse)
full <- mock %>%
mutate(obs = 1) %>%
complete(person, obs = 1:(max(postBaseline) + 6)) %>%
fill(group, baseline, baselineStart, postBaseline) %>%
mutate(
date = baselineStart + (7 * (obs - 1)),
pb_dummy = ifelse(obs <= baseline, 0, 1)
) %>%
filter(obs <= (baseline + postBaseline)) %>%
select(person, group, date, obs, pb_dummy)
full[21:30, ]
#> # A tibble: 10 x 5
#> person group date obs pb_dummy
#> <int> <int> <date> <dbl> <dbl>
#> 1 1 1 2018-10-22 21. 1.
#> 2 1 1 2018-10-29 22. 1.
#> 3 1 1 2018-11-05 23. 1.
#> 4 1 1 2018-11-12 24. 1.
#> 5 2 1 2018-06-04 1. 0.
#> 6 2 1 2018-06-11 2. 0.
#> 7 2 1 2018-06-18 3. 0.
#> 8 2 1 2018-06-25 4. 0.
#> 9 2 1 2018-07-02 5. 0.
#> 10 2 1 2018-07-09 6. 0.
由reprex package(v0.2.0)创建于2018-04-09。
答案 1 :(得分:1)
library(tidyverse)
mock%>%group_by(person)%>%
mutate(obs=list(1:(baseline+postBaseline)),#The number of observations
date=list(as.character(as.Date( baselineStart)+1:length(obs[[1]])*7-7)), #The dates They add a week after the current date
postbaseline=list(as.integer(obs[[1]]>baseline)))%>%
select(group,date,obs,postbaseline)%>%
unnest()
# A tibble: 488 x 5
# Groups: person [20]
person group date obs postbaseline
<int> <int> <chr> <int> <int>
1 1 1 2018-06-04 1 0
2 1 1 2018-06-11 2 0
3 1 1 2018-06-18 3 0
4 1 1 2018-06-25 4 0
5 1 1 2018-07-02 5 1
6 1 1 2018-07-09 6 1
7 1 1 2018-07-16 7 1
8 1 1 2018-07-23 8 1
9 1 1 2018-07-30 9 1
10 1 1 2018-08-06 10 1
# ... with 478 more rows