问题
我希望在我的数据框中归因于重复测量研究得出的NA。关于此特定结果,我需要从最近观测值开始的每个 +52 周间隔内,以最后观测到的非NA值 +1 估算NA。>
示例
包含目标插补目标的示例数据框。
df <- data.frame(
subject = rep(1:3, each = 12),
week = rep(c(8, 10, 12, 16, 20, 26, 32, 44, 52, 64, 78, 104),3),
value = c(112, 97, 130, 104, NA, NA, NA, NA, NA, NA, NA, NA,
89, 86, 94, 96, 88,107, 110, 102, 107, NA, NA, NA,
107, 110, 102, 130, 104, 88, 82, 79, 92, 106, NA, NA),
goal = c(112, 97, 130, 104, 104, 104, 104, 104, 104, 104, 105, 105,
89, 86, 94, 96, 88,107, 110, 102, 107, 107,107, 108,
107, 110, 102, 130, 104, 88, 82, 79, 92, 106, 106, 106)
)
答案 0 :(得分:4)
我将中间列留在其中,以使发生的事情更加明显,但是您可以使用简单的select
来删除它们。
df = df %>%
group_by(subject) %>%
mutate(last_obs_week = max(week[!is.na(value)]),
since_last_week = pmax(0, week - last_obs_week),
inc_52 = since_last_week %/% 52,
result = zoo::na.locf(value) + inc_52
)
all(df$goal == df$result)
# [1] TRUE
print.data.frame(df)
# subject week value goal last_obs_week since_last_week inc_52 result
# 1 1 8 112 112 16 0 0 112
# 2 1 10 97 97 16 0 0 97
# 3 1 12 130 130 16 0 0 130
# 4 1 16 104 104 16 0 0 104
# 5 1 20 NA 104 16 4 0 104
# 6 1 26 NA 104 16 10 0 104
# 7 1 32 NA 104 16 16 0 104
# 8 1 44 NA 104 16 28 0 104
# 9 1 52 NA 104 16 36 0 104
# 10 1 64 NA 104 16 48 0 104
# 11 1 78 NA 105 16 62 1 105
# 12 1 104 NA 105 16 88 1 105
# 13 2 8 89 89 52 0 0 89
# ...
答案 1 :(得分:2)
一个人可以使用dplyr
和tidyr::fill
获得所需的结果。逻辑将是添加一列以跟踪具有week
值的non-NA
。使用tidyr::fill
填充最近的non-NA
值,然后检查当前周与最近non-NA
周的差是否大于52
,然后将值增加1
。
library(dplyr)
library(tidyr)
df %>% group_by(subject) %>%
mutate(weekWithLastNonNaValue = ifelse(is.na(value), NA, week)) %>%
fill(value, weekWithLastNonNaValue) %>%
mutate(value = value + (week-weekWithLastNonNaValue) %/% 52) %>%
select(-weekWithLastNonNaValue) %>%
as.data.frame()
# subject week value goal
# 1 1 8 112 112
# 2 1 10 97 97
# 3 1 12 130 130
# 4 1 16 104 104
# 5 1 20 104 104
# 6 1 26 104 104
# 7 1 32 104 104
# 8 1 44 104 104
# 9 1 52 104 104
# 10 1 64 104 104
# 11 1 78 105 105
# 12 1 104 105 105
# 13 2 8 89 89
# 14 2 10 86 86
# 15 2 12 94 94
# 16 2 16 96 96
# 17 2 20 88 88
# 18 2 26 107 107
# 19 2 32 110 110
# 20 2 44 102 102
#
# so on
#