错误'数据中没有分数'在插入符号包中使用knnImpute时

时间:2014-08-09 15:37:04

标签: r r-caret

为了演示这个问题,因为我无法上传我的数据,我从http://mkseo.pe.kr/stats/?p=719借了代码(非常感谢Minkoo为他的网站)。

我在这里唯一改变的是'bagImpute'改为'knnImpute'来证明这个问题。我省略了与此问题无关的最后几行。运行此代码会导致

nn2中的错误(old [,cols,drop = FALSE],new [,cols,drop = FALSE],k = k):   没有数据点!

library(caret)
library(doMC)  # For parallelism.

data(iris)
# 80% for training and 20% for verification.
# createDataPartition takes stratified samples,
# i.e., it takes the equal number of samples from each Species.
inTrain <- createDataPartition(iris$Species, p=0.8, list=FALSE)
training <- iris[inTrain, ]
verification <- iris[-inTrain, ]

# Make some data (incl. verification) missing on purpose.
fillInNa <- function(d) {
  naCount <- NROW(d) * 0.1
  for (i in sample(NROW(d), naCount)) {
    d[i, sample(4, 1)] <- NA
  }
  return(d)
}

training <- fillInNa(training)
verification <- fillInNa(verification)

# Because we have missing values across all columns, we need to
# use bagged trees. If just one column had NAs, we can use
# knnImpute which is faster. Also, note that preProcess is done
# only for training. For verification, we use the
# preProc generated from training.
preProc <- preProcess(method="knnImpute", training[, 1:4])
training[, 1:4] <- predict(preProc, training[, 1:4])
verification[, 1:4] <- predict(preProc, verification[, 1:4])

1 个答案:

答案 0 :(得分:0)

你在这里写的代码对我有用:

> verification[, 1:4] <- predict(preProc, verification[, 1:4])
> head(verification)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5    -1.0128842  0.03725657    -1.132276   -1.048478  setosa
7    -1.4933036  0.78647702    -1.344562   -1.206576  setosa
14   -1.8536182 -0.12129745    -1.512934   -1.470408  setosa
21   -0.5324648  0.78647702    -1.176189   -1.338492  setosa
23   -1.4933036  1.24036425    -1.569059   -1.338492  setosa
27   -1.0128842  0.78647702    -1.232313   -1.074660  setosa

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.0-5  knitr_1.6           magrittr_1.1.0      gtable_0.1.2        reshape2_1.4       
 [6] reshape_0.8.5       raster_2.2-31       sp_1.0-15           akima_0.5-11        randomForest_4.6-10
[11] earth_3.2-7         plotrix_3.5-7       plotmo_1.3-3        caret_6.0-30        lattice_0.20-29    
[16] dplyr_0.2           plyr_1.8.1          ggplot2_1.0.0      

loaded via a namespace (and not attached):
 [1] assertthat_0.1      BradleyTerry2_1.0-5 brglm_0.5-9         car_2.0-20          codetools_0.2-8    
 [6] colorspace_1.2-4    digest_0.6.4        evaluate_0.5.5      foreach_1.4.2       formatR_0.10       
[11] gtools_3.4.1        iterators_1.0.7     labeling_0.2        lme4_1.1-7          MASS_7.3-31        
[16] Matrix_1.1-3        minqa_1.2.3         munsell_0.4.2       nlme_3.1-117        nloptr_1.0.4       
[21] nnet_7.3-8          packrat_0.4.0.8     parallel_3.1.0      proto_0.3-10        Rcpp_0.11.2        
[26] scales_0.2.4        splines_3.1.0       stringr_0.6.2       tools_3.1.0