R根据时间间隔线性增加归因NA

时间:2018-07-24 19:34:49

标签: r dplyr zoo

问题

我希望在我的数据框中归因于重复测量研究得出的NA。关于此特定结果,我需要从最近观测值开始的每个 +52 周间隔内,以最后观测到的非NA值 +1 估算NA。

示例

包含目标插补目标的示例数据框。

df <- data.frame(
  subject = rep(1:3, each = 12),
  week = rep(c(8, 10, 12, 16, 20, 26, 32, 44, 52, 64, 78, 104),3),
  value = c(112, 97, 130, 104, NA, NA, NA, NA, NA, NA, NA, NA,
            89, 86, 94, 96, 88,107, 110, 102, 107, NA, NA, NA,
            107, 110, 102, 130, 104, 88, 82, 79, 92, 106, NA, NA),
  goal = c(112, 97, 130, 104, 104, 104, 104, 104, 104, 104, 105, 105,
            89, 86, 94, 96, 88,107, 110, 102, 107, 107,107, 108,
            107, 110, 102, 130, 104, 88, 82, 79, 92, 106, 106, 106)
)

2 个答案:

答案 0 :(得分:4)

我将中间列留在其中,以使发生的事情更加明显,但是您可以使用简单的select来删除它们。

df = df %>%
  group_by(subject) %>%
  mutate(last_obs_week = max(week[!is.na(value)]),
         since_last_week = pmax(0, week - last_obs_week),
         inc_52 = since_last_week %/% 52,
         result = zoo::na.locf(value) + inc_52
  ) 

all(df$goal == df$result)
# [1] TRUE

print.data.frame(df)
#    subject week value goal last_obs_week since_last_week inc_52 result
# 1        1    8   112  112            16               0      0    112
# 2        1   10    97   97            16               0      0     97
# 3        1   12   130  130            16               0      0    130
# 4        1   16   104  104            16               0      0    104
# 5        1   20    NA  104            16               4      0    104
# 6        1   26    NA  104            16              10      0    104
# 7        1   32    NA  104            16              16      0    104
# 8        1   44    NA  104            16              28      0    104
# 9        1   52    NA  104            16              36      0    104
# 10       1   64    NA  104            16              48      0    104
# 11       1   78    NA  105            16              62      1    105
# 12       1  104    NA  105            16              88      1    105
# 13       2    8    89   89            52               0      0     89
# ...

答案 1 :(得分:2)

一个人可以使用dplyrtidyr::fill获得所需的结果。逻辑将是添加一列以跟踪具有week值的non-NA。使用tidyr::fill填充最近的non-NA值,然后检查当前周与最近non-NA周的差是否大于52,然后将值增加1

library(dplyr)
library(tidyr)

df %>% group_by(subject) %>%
  mutate(weekWithLastNonNaValue = ifelse(is.na(value), NA, week)) %>%
  fill(value, weekWithLastNonNaValue) %>%
  mutate(value = value + (week-weekWithLastNonNaValue) %/% 52) %>%
  select(-weekWithLastNonNaValue) %>%
  as.data.frame()

# subject week value goal
# 1        1    8   112  112
# 2        1   10    97   97
# 3        1   12   130  130
# 4        1   16   104  104
# 5        1   20   104  104
# 6        1   26   104  104
# 7        1   32   104  104
# 8        1   44   104  104
# 9        1   52   104  104
# 10       1   64   104  104
# 11       1   78   105  105
# 12       1  104   105  105
# 13       2    8    89   89
# 14       2   10    86   86
# 15       2   12    94   94
# 16       2   16    96   96
# 17       2   20    88   88
# 18       2   26   107  107
# 19       2   32   110  110
# 20       2   44   102  102
#
# so on
#