我已经获得了这个数据框架,其中包含了NA' s。
DATE <- c("1","2","3","4","5","6","7","1","2","3","4","5","6","7")
COMP <- c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B")
BM <- c(12,11,NA,14,NA,15,NA, 5, 5, NA, 6, NA, 8, 9)
df <- data.frame(DATE, COMP, BM, stringsAsFactors=F)
df
# DATE COMP BM
# 1 1 A 12
# 2 2 A 11
# 3 3 A NA
# 4 4 A 14
# 5 5 A NA
# 6 6 A 15
# 7 7 A NA
# 8 1 B 5
# 9 2 B 5
# 10 3 B NA
# 11 4 B 6
# 12 5 B NA
# 13 6 B 8
# 14 7 B 9
我想用上一行和后一行中的值的平均值替换那些NA(当然,只有当它是同一家公司时)。如果第一行是NA,则应该采用以下行的值,如果最后一行是NA,则应该采用第二行的第二行值。
输出应该如下所示
# DATE COMP BM
# 1 1 A 12
# 2 2 A 11
# 3 3 A 12.5
# 4 4 A 14
# 5 5 A 14.5
# 6 6 A 15
# 7 7 A 15
# 8 1 B 5
# 9 2 B 5
# 10 3 B 5.5
# 11 4 B 6
# 12 5 B 7
# 13 6 B 8
# 14 7 B 9
谢谢!
答案 0 :(得分:4)
这是zoo:::na.approx
的工作:
library(plyr)
library(zoo)
ddply(df, .(COMP), transform, BM=na.approx(BM, rule=2))
# DATE COMP BM
# 1 1 A 12.0
# 2 2 A 11.0
# 3 3 A 12.5
# 4 4 A 14.0
# 5 5 A 14.5
# 6 6 A 15.0
# 7 7 A 15.0
# 8 1 B 5.0
# 9 2 B 5.0
# 10 3 B 5.5
# 11 4 B 6.0
# 12 5 B 7.0
# 13 6 B 8.0
# 14 7 B 9.0
修改强>
回应评论:您需要处理只有1个非NA值或仅有NA值的案例。
my.na.approx <- function(x) {
if (sum(is.finite(x)) == 0L) return(x)
if (sum(is.finite(x)) == 1L) return(na.approx(x, rule=2, method="constant"))
na.approx(x, rule=2)
}
my.na.approx(c(NA, 1, NA, NA, 2, NA))
#[1] 1.000000 1.000000 1.333333 1.666667 2.000000 2.000000
my.na.approx(c(NA, NA, NA, NA, 2, NA))
#[1] 2 2 2 2 2 2
my.na.approx(c(NA, NA, NA, NA, NA, NA))
#[1] NA NA NA NA NA NA