我可以使用哪些技术(例如KNN,Max可能性)来查找缺失值? 我想使用R并试图找到一种合适的技术来估算缺失值。
样本数据如下所示:
F1 F2 F3 F4 F5 Class
Good 20 5 7 Old Normal
Good Missing 8 8 Old Normal
Good 15 10 10 Old Normal
Good 50 10 10 Old Normal
Good 70 10 10 Old Abnormal
Bad 20 5 7 Old Abnormal
Good 20 5 80 Old Abnormal
Good 85 100 100 Old Abnormal
Good 20 100 Missing Old Abnormal
Good 24 6 8.4 Old Normal
Good 12 9.6 9.6 Old Normal
Good 18 12 12 Old Normal
Good 60 12 12 Old Normal
Good 84 Missing 12 Old Abnormal
Bad 24 6 8.4 Old Abnormal
Good 24 6 96 Old Abnormal
Good 102 120 120 Old Abnormal
Good 24 120 72 Old Abnormal
答案 0 :(得分:1)
以下是一些可以帮助您进行分析的代码
any(is.na(..name of data..))
require(VIM)
aggr(..name of data..,plot = TRUE,bars=TRUE)
propmiss <- function(dataframe) lapply(dataframe,function(x) data.frame(nmiss=sum(is.na(x)), n=length(x), propmiss=sum(is.na(x))/length(x)))
propmiss(..数据的名称..)
sparse.rows = c()
for (i in 1:nrow(clust.datatrain)) {
if (sum(length(which(is.na(clust.datatrain[i,])))) > 0.5*ncol(clust.datatrain)) {
sparse.rows = c(sparse.rows,i)
}
}
length(sparse.rows) #25
clust.datatrain = clust.datatrain[-sparse.rows,]
require(DMwR)
train.1=knnImputation(clust.datatrain, k = 10, scale = T, meth = "weighAvg",
distData = NULL)
require(mice)
xdash=mice(datafile,m=5,maxit=50,meth='norm',seed=500)
completedata=complete(xdash,1)
completedata
这一切都应该有利于分析和估算!