展开数据框,以按列值为条件为每个人添加新的观察结果

时间:2018-04-09 21:45:27

标签: r

我正在尝试创建一个数据集的玩具示例,其中人员嵌套在组中,其中每个人有4或6个baseline观察值和最多20 postBaseline个观察值。这是起点:

   person group baseline baselineStart postBaseline
1       1     1        4    2018-06-04           20
2       2     1        6    2018-06-04           20
3       3     1        4    2018-06-04           20
4       4     1        6    2018-06-04           20
5       5     2        4    2018-07-02           20
6       6     2        6    2018-07-02           20
7       7     2        4    2018-07-02           20
8       8     2        6    2018-07-02           20
9       9     3        4    2018-07-30           20
10     10     3        6    2018-07-30           20
11     11     3        4    2018-07-30           20
12     12     3        6    2018-07-30           20
13     13     4        4    2018-09-03           20
14     14     4        6    2018-09-03           20
15     15     4        4    2018-09-03           20
16     16     4        6    2018-09-03           20
17     17     5        4    2018-10-01           18
18     18     5        6    2018-10-01           16
19     19     5        4    2018-10-01           18
20     20     5        6    2018-10-01           16

我想要一个类似于人1的迷你示例:

enter image description here

所以我正在寻找一种方法来为person 1开始2018-06-04创建4个基线行,然后是20 postBaseline行。观察日期应增加1周。

以下是初始数据框:

mock <- data.frame(person = seq(from=1, to=20),
                   group = rep(1:5, times=1, each=4),
                   baseline = rep(c(4, 6), times=2, each=1),
                   baselineStart = ymd(rep(c("2018-06-04",
                                             "2018-07-02",
                                             "2018-07-30",
                                             "2018-09-03",
                                             "2018-10-01"),
                                           times=1,
                                           each=4)
                   ),
                   postBaseline = c(rep(20, 4),
                                    rep(20, 4),
                                    rep(20, 4),
                                    rep(20, 4),
                                    rep(c(18, 16), 2)
                   )
)

2 个答案:

答案 0 :(得分:1)

以下是使用tidyverse的解决方案。 complete用于扩展数据框以包含obsperson,然后fill填写缺失值。我们只需在日期中添加obs - 1倍数7即可生成正确的周数,并将obsbaseline进行比较,以查看行是否为postBaseline。最后,我们可以过滤掉obs大于研究范围的行(由baseline + postBaseline给出)。我显示了人1和2之间边界的打印输出,表明它在预期时结束并且pb_dummy正确更改。

mock <- data.frame(person = seq(from=1, to=20),
                   group = rep(1:5, times=1, each=4),
                   baseline = rep(c(4, 6), times=2, each=1),
                   baselineStart = lubridate::ymd(rep(c("2018-06-04",
                                             "2018-07-02",
                                             "2018-07-30",
                                             "2018-09-03",
                                             "2018-10-01"),
                                           times=1,
                                           each=4)
                   ),
                   postBaseline = c(rep(20, 4),
                                    rep(20, 4),
                                    rep(20, 4),
                                    rep(20, 4),
                                    rep(c(18, 16), 2)
                   )
)

library(tidyverse)
full <- mock %>%
  mutate(obs = 1) %>%
  complete(person, obs = 1:(max(postBaseline) + 6)) %>%
  fill(group, baseline, baselineStart, postBaseline) %>%
  mutate(
    date = baselineStart + (7 * (obs - 1)),
    pb_dummy = ifelse(obs <= baseline, 0, 1)
  ) %>%
  filter(obs <= (baseline + postBaseline)) %>%
  select(person, group, date, obs, pb_dummy)

full[21:30, ]
#> # A tibble: 10 x 5
#>    person group date         obs pb_dummy
#>     <int> <int> <date>     <dbl>    <dbl>
#>  1      1     1 2018-10-22   21.       1.
#>  2      1     1 2018-10-29   22.       1.
#>  3      1     1 2018-11-05   23.       1.
#>  4      1     1 2018-11-12   24.       1.
#>  5      2     1 2018-06-04    1.       0.
#>  6      2     1 2018-06-11    2.       0.
#>  7      2     1 2018-06-18    3.       0.
#>  8      2     1 2018-06-25    4.       0.
#>  9      2     1 2018-07-02    5.       0.
#> 10      2     1 2018-07-09    6.       0.

reprex package(v0.2.0)创建于2018-04-09。

答案 1 :(得分:1)

library(tidyverse)
mock%>%group_by(person)%>%
  mutate(obs=list(1:(baseline+postBaseline)),#The number of observations
         date=list(as.character(as.Date( baselineStart)+1:length(obs[[1]])*7-7)), #The dates They add a week after the current date
         postbaseline=list(as.integer(obs[[1]]>baseline)))%>% 
  select(group,date,obs,postbaseline)%>%
  unnest()

# A tibble: 488 x 5
# Groups:   person [20]
   person group date         obs postbaseline
    <int> <int> <chr>      <int>        <int>
 1      1     1 2018-06-04     1            0
 2      1     1 2018-06-11     2            0
 3      1     1 2018-06-18     3            0
 4      1     1 2018-06-25     4            0
 5      1     1 2018-07-02     5            1
 6      1     1 2018-07-09     6            1
 7      1     1 2018-07-16     7            1
 8      1     1 2018-07-23     8            1
 9      1     1 2018-07-30     9            1
10      1     1 2018-08-06    10            1
# ... with 478 more rows