为了演示这个问题,因为我无法上传我的数据,我从http://mkseo.pe.kr/stats/?p=719借了代码(非常感谢Minkoo为他的网站)。
我在这里唯一改变的是'bagImpute'改为'knnImpute'来证明这个问题。我省略了与此问题无关的最后几行。运行此代码会导致
nn2中的错误(old [,cols,drop = FALSE],new [,cols,drop = FALSE],k = k): 没有数据点!
library(caret)
library(doMC) # For parallelism.
data(iris)
# 80% for training and 20% for verification.
# createDataPartition takes stratified samples,
# i.e., it takes the equal number of samples from each Species.
inTrain <- createDataPartition(iris$Species, p=0.8, list=FALSE)
training <- iris[inTrain, ]
verification <- iris[-inTrain, ]
# Make some data (incl. verification) missing on purpose.
fillInNa <- function(d) {
naCount <- NROW(d) * 0.1
for (i in sample(NROW(d), naCount)) {
d[i, sample(4, 1)] <- NA
}
return(d)
}
training <- fillInNa(training)
verification <- fillInNa(verification)
# Because we have missing values across all columns, we need to
# use bagged trees. If just one column had NAs, we can use
# knnImpute which is faster. Also, note that preProcess is done
# only for training. For verification, we use the
# preProc generated from training.
preProc <- preProcess(method="knnImpute", training[, 1:4])
training[, 1:4] <- predict(preProc, training[, 1:4])
verification[, 1:4] <- predict(preProc, verification[, 1:4])
答案 0 :(得分:0)
你在这里写的代码对我有用:
> verification[, 1:4] <- predict(preProc, verification[, 1:4])
> head(verification)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5 -1.0128842 0.03725657 -1.132276 -1.048478 setosa
7 -1.4933036 0.78647702 -1.344562 -1.206576 setosa
14 -1.8536182 -0.12129745 -1.512934 -1.470408 setosa
21 -0.5324648 0.78647702 -1.176189 -1.338492 setosa
23 -1.4933036 1.24036425 -1.569059 -1.338492 setosa
27 -1.0128842 0.78647702 -1.232313 -1.074660 setosa
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] RColorBrewer_1.0-5 knitr_1.6 magrittr_1.1.0 gtable_0.1.2 reshape2_1.4
[6] reshape_0.8.5 raster_2.2-31 sp_1.0-15 akima_0.5-11 randomForest_4.6-10
[11] earth_3.2-7 plotrix_3.5-7 plotmo_1.3-3 caret_6.0-30 lattice_0.20-29
[16] dplyr_0.2 plyr_1.8.1 ggplot2_1.0.0
loaded via a namespace (and not attached):
[1] assertthat_0.1 BradleyTerry2_1.0-5 brglm_0.5-9 car_2.0-20 codetools_0.2-8
[6] colorspace_1.2-4 digest_0.6.4 evaluate_0.5.5 foreach_1.4.2 formatR_0.10
[11] gtools_3.4.1 iterators_1.0.7 labeling_0.2 lme4_1.1-7 MASS_7.3-31
[16] Matrix_1.1-3 minqa_1.2.3 munsell_0.4.2 nlme_3.1-117 nloptr_1.0.4
[21] nnet_7.3-8 packrat_0.4.0.8 parallel_3.1.0 proto_0.3-10 Rcpp_0.11.2
[26] scales_0.2.4 splines_3.1.0 stringr_0.6.2 tools_3.1.0