我有一个数据矩阵,每行缺失值的数量不同。我想要的是用行替换缺失值,这意味着每行缺失值的数量为N(假设为1)。
我已经为这个问题创建了一个解决方案,但这是一个非常微不足道的解决方案,因此我正在寻找其他解决方案。
我的解决方案:
#SAMPLE DATA
a <- c(rep(c(1:4, NA), 2))
b <- c(rep(c(1:3, NA, 5), 2))
c <- c(rep(c(1:3, NA, 5), 2))
df <- as.matrix(cbind(a,b,c), ncol = 3, nrow = 10)
#CALCULATING THE NUMBER OF MISSING VALUES PER ROW
miss_row <- rowSums(apply(as.matrix(df), c(1,2), function(x) {
sum(is.na(x)) +
sum(x == "", na.rm=TRUE)
}) )
df <- cbind(df, miss_row)
#CALCULATING THE ROW MEANS FOR ROWS WITH 1 MISSING VALUE
row_mean <- ifelse(df[,4] == 1, rowMeans(df[,1:3], na.rm = TRUE), NA)
df <- cbind(df, row_mean)
答案 0 :(得分:5)
这是我在评论中提及的方式,其中有更多详细信息:
# create your matrix
df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there
# Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
nb_NA_row <- rowSums(is.na(df))
# Replace missing values row-wise by the row mean when there is N NA in the row
N <- 1 # the given example
df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]
# check df
df
# a b c
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 NA NA
# [5,] 5 5 5
# [6,] 1 1 1
# [7,] 2 2 2
# [8,] 3 3 3
# [9,] 4 NA NA
#[10,] 5 5 5
答案 1 :(得分:1)
df <- data.frame(df)
df$miss_row <- rowSums(is.na(df))
df$row_mean <- NA
df$row_mean[df$miss_row == 1] <- rowMeans(df[df$miss_row == 1,1:3],na.rm = TRUE)
# a b c miss_row row_mean
# 1 1 1 1 0 NA
# 2 2 2 2 0 NA
# 3 3 3 3 0 NA
# 4 4 NA NA 2 NA
# 5 NA 5 5 1 5
# 6 1 1 1 0 NA
# 7 2 2 2 0 NA
# 8 3 3 3 0 NA
# 9 4 NA NA 2 NA
# 10 NA 5 5 1 5
(这给出了您期望的输出,似乎与您的文本不完全一致,但是请参见注释和重复的链接)