使用RFE预测新栅格数据的问题

时间:2019-10-28 10:20:28

标签: svm predict caret rfe

我正在R中使用插入符号在线性svm模型上运行rfe并调整网格(作为示例-我也在尝试运行径向svm和多项式svm)。我正在尝试减少预测变量的数量(我总共有62个),然后想使用最终模型对新数据进行预测。我假设一旦有了最终的预测变量数,那么在使用预测函数时,我应该使用新数据,仅包含那些选定的预测变量,而不是整个预测变量集?

我正在预测变量的栅格模型,并且想知道最终模型是否会自动对这些变量进行预处理,或者我是否必须在进行预测之前自己做这些事情?

一旦有了最终模型,就运行预测变量(L_model)并选择10个预测变量。然后,在新的数据集中,删除未选择的预测变量(此处以栅格文件的形式),并尝试使用L_model $ fit进行预测-但是生成的栅格地图看起来很奇怪,而不是我想要的预期的。

#model.data_fluigdigm contains asine-sqrt transformed prevalence data and 62 
#predictors, all spatially extracted from raster files (as the prevalence 
#data is also spatial with x and y coordinates) - I have already run 
#correlation tests and removed highly correlated predictors (started out 
#with 104 predictors). I have 50 data points in total, which is why I would 
#like to reduce the amount of predictors further.

#create a trainControl element:
fitControl <- trainControl(method='LOOCV',  
                       savePredictions = "final",
                       returnResamp = "final")

## tuning grid for the linear model
grid_linear <-expand.grid(.C=c(0.01,0.1,1)) 

#running the model
L_model <- caret::rfe(Prev  ~ ., data =model.data_fluigdigm1
                                  , method = "svmLinear"
                                  ,tuneGrid = grid_linear
                                  , preProc = c("center","scale")
                                  ,sizes = c(5, 10, 20,30,40)
                                  ,trControl=fitControl
                                  ,rfeControl=rfeControl(functions = 
                                  caretFuncs,method='LOOCV', returnResamp=
                                  "final"))                                                          


#check model fit from CV
postResample(L_model$fit$pred$pred, L_model$fit$pred$obs)
[1]  RMSE  Rsquared       MAE 
0.1771695 0.4771384 0.1337923 

# now I want to predict to a larger area based on predictors in raster files
## I have a total of 62 raster files and a mask raster
rasterFiles <- list.files(path = "E:/Predictors/Predictors_dummy")
maskRaster <- raster('E:/Vecmap/ScandTick/GIS/maskRaster.tif')

# predictors selected by the model
rasterList <- predictors(L_model)
rasterList <- paste(rasterList,".tif",sep="")

#only select these predictor rasters to further predict
rasterFiles <- rasterFiles[rasterFiles %in% rasterList]

#create raster brick

ras1 <- lapply(rasterFiles,raster)
ras2 <- lapply(ras1,crop,maskRaster) 
ras2 <- lapply(ras2,raster::mask,maskRaster)
myBrick <- brick(ras2)

#the only way I can predict is by using L_model$fit, if I try just using 
#L_model, I get an error asking for all the predictors entered into the 
#model. When using the L_model$fit are the predictors then automatically 
#preprocessed?


p <- predict(myBrick,L_model$fit)

writeRaster(p, "Prev_Ri2", datatype= 'FLT4S', format = "GTiff",    
overwrite=TRUE)

生成的栅格看起来非常奇怪-当然可能只是因为模型不够好,但是我想确保我的方法正确无误-使用L_model $ fit是正确的吗?< / p>

0 个答案:

没有答案