使用pca预测随机森林时出错

时间:2019-05-26 11:59:11

标签: r random-forest r-caret predict caret

我对森林进行了随机分类,试图在其中发现入侵的植物。

仅运行正常的RF时分类工作正常,但是一旦我尝试使用PCA(主成分分析)进行相同的RF分类,就会出现问题。

#Loading in our data as bricks
sentinel10 <- brick("data/mosaics/10mbands_m.tif")
indices <- brick("data/mosaics/indices_m.tif")
sentinel20 <- brick("data/mosaics/20mbands_c.tif")

#Loading our validation data set
training <- read_sf("data/ValidationSet.shp")

#Stacking our data into one stack
allpredictors <- stack(sentinel10,indices,sentinel20_r)

#extracting pixels where training data is present on all layers.
extr <- extract(allpredictors, training, df=TRUE)
extr <- merge(extr, training, by.x="ID", by.y="id")

#Splitting the training data
trainids <- createDataPartition(extr$class_c,list=FALSE,p=0.7)

#Defining training data and testing data
trainDat <- extr[trainids,]
testDat <- extr[-trainids,]
trainData_copy <-trainDat

#Cleaning and transforming data for PCA
test_clean <- testDat
test_clean[,c("ID", "path", "Rasterized", "X", "Y", "layer", "Class", 
"random", "class_c", "geometry")] <- NULL

trainDat_clean <- trainDat
trainDat_clean[,c("ID", "path", "Rasterized", "X", "Y", "layer", "Class", 
"random", "class_c", "geometry")] <- NULL

predictors1 < c("SummerB2", "SummerB3", "SummerB4", "SummerB8", 
"AutumnB2", "AutumnB3","AutumnB4"....)

#Preprocessing training and test data for PCA
prComp <- prcomp(trainDat_clean, scale = TRUE)

#preparing PCA for test predictions and confusion matrix,
pca_pred_test <- predict(prComp, newdata = test_clean)
pca_pred_test <- as.data.frame(pca_pred_test)
pca_pred_test <- pca_pred_test[,1:10]

#PCA model parameters
train.data<-data.frame(classe = trainDat$class_c, prComp$x)
train.data <- train.data[,1:11]

#Parameters
metric <- "Accuracy"
cross_validation <- trainControl(method="cv",search="random", number=10)
Number_of_tune_tries <- 15
train_predictors <- trainDat[,c(predictors1)]
preprocessed <- c("scale", "center")

#Random Forest#
#First model, with no preprocessing
model <- caret::train(train_predictors,trainDat$class_c, 
                  method="rf", 
                  metric=metric,
                  ntree=1000,
                  tuneLength= Number_of_tune_tries,
                  trControl=cross_validation,
                  importance=TRUE)

#trying the model on the test data.
rf_pred_test <- predict(model, testDat)
confusionMatrix(rf_pred_test, as.factor(testDat$class_c))

#Making the prediction
prediction <- predict(allpredictors,model, filename="prediction_tuned", 
progress ="window", overwrite=TRUE)

#Creating the RF+PCA model.
model_rf <- caret::train(classe~.,data=train.data, 
                     method="rf", 
                     metric=metric,
                     ntree=1000,
                     tuneLength= Number_of_tune_tries,
                     trControl=cross_validation)

#Making the prediction for the RF+PCA
prediction_PCA <- predict(allpredictors, filename="prediction_rf_pca", 
progress ="window")

因此,正如我提到的那样,第一个随机森林预测很好,但是当我尝试对model_rf进行预测时,我得到了错误

Error in eval(predvars, data, env) : object 'PC1' not found.

该模型有效,但预测无效,我也不知道如何解决此问题。

0 个答案:

没有答案