我有一个数据框如下
p1_bin和f1_bin由我用cut
函数计算
Bins <- function(x) cut(x, breaks = c(0, seq(1, 1000, by = 5)), labels = 1:200)
binned <- as.data.frame (sapply(df[,-1], Bins))
colnames(binned) <- paste("Bin", colnames(binned), sep = "_")
df<- cbind(df, binned)
现在如何计算前两年的平均值/平均值,并将NA值替换为该值
for example : at row-5 value is NA for p1 and f1 is 30 with corresponding bin 7.. now replace NA with previous 2 years mean for same bin (7) ,i.e
df
ID year p1 f1 Bin_p1 Bin_f1
1 2013 20 30 5 7
2 2013 24 29 5 7
3 2014 10 16 2 3
4 2014 11 17 2 3
5 2015 NA 30 NA 7
6 2016 10 NA 2 NA
df1
ID year p1 f1 Bin_p1 Bin_f1
1 2013 20 30 5 7
2 2013 24 29 5 7
3 2014 10 16 2 3
4 2014 11 17 2 3
5 2015 **22** 30 NA 7
6 2016 10 **16.5** 2 NA
Thanks in advance
答案 0 :(得分:1)
我相信以下代码会产生所需的输出。可能有一种比使用mean(rev(lag(f1))[1:2])
获得f1
的最后两个值的平均值更优雅的方法,但无论如何这应该可以做到。
library(dplyr)
df %>%
arrange(year) %>%
mutate_at(c("p1", "f1"), "as.double") %>%
group_by(Bin_p1) %>%
mutate(f1 = ifelse(is.na(f1), mean(rev(lag(f1))[1:2]), f1)) %>%
group_by(Bin_f1) %>%
mutate(p1 = ifelse(is.na(p1), mean(rev(lag(p1))[1:2]), p1)) %>%
ungroup
,输出为:
# A tibble: 6 x 6
ID year p1 f1 Bin_p1 Bin_f1
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2013 20 30.0 5 7
2 2 2013 24 29.0 5 7
3 3 2014 10 16.0 2 3
4 4 2014 11 17.0 2 3
5 5 2015 22 30.0 NA 7
6 6 2016 10 16.5 2 NA