我正在R中使用插入符号在线性svm模型上运行rfe并调整网格(作为示例-我也在尝试运行径向svm和多项式svm)。我正在尝试减少预测变量的数量(我总共有62个),然后想使用最终模型对新数据进行预测。我假设一旦有了最终的预测变量数,那么在使用预测函数时,我应该使用新数据,仅包含那些选定的预测变量,而不是整个预测变量集?
我正在预测变量的栅格模型,并且想知道最终模型是否会自动对这些变量进行预处理,或者我是否必须在进行预测之前自己做这些事情?
一旦有了最终模型,就运行预测变量(L_model)并选择10个预测变量。然后,在新的数据集中,删除未选择的预测变量(此处以栅格文件的形式),并尝试使用L_model $ fit进行预测-但是生成的栅格地图看起来很奇怪,而不是我想要的预期的。
#model.data_fluigdigm contains asine-sqrt transformed prevalence data and 62
#predictors, all spatially extracted from raster files (as the prevalence
#data is also spatial with x and y coordinates) - I have already run
#correlation tests and removed highly correlated predictors (started out
#with 104 predictors). I have 50 data points in total, which is why I would
#like to reduce the amount of predictors further.
#create a trainControl element:
fitControl <- trainControl(method='LOOCV',
savePredictions = "final",
returnResamp = "final")
## tuning grid for the linear model
grid_linear <-expand.grid(.C=c(0.01,0.1,1))
#running the model
L_model <- caret::rfe(Prev ~ ., data =model.data_fluigdigm1
, method = "svmLinear"
,tuneGrid = grid_linear
, preProc = c("center","scale")
,sizes = c(5, 10, 20,30,40)
,trControl=fitControl
,rfeControl=rfeControl(functions =
caretFuncs,method='LOOCV', returnResamp=
"final"))
#check model fit from CV
postResample(L_model$fit$pred$pred, L_model$fit$pred$obs)
[1] RMSE Rsquared MAE
0.1771695 0.4771384 0.1337923
# now I want to predict to a larger area based on predictors in raster files
## I have a total of 62 raster files and a mask raster
rasterFiles <- list.files(path = "E:/Predictors/Predictors_dummy")
maskRaster <- raster('E:/Vecmap/ScandTick/GIS/maskRaster.tif')
# predictors selected by the model
rasterList <- predictors(L_model)
rasterList <- paste(rasterList,".tif",sep="")
#only select these predictor rasters to further predict
rasterFiles <- rasterFiles[rasterFiles %in% rasterList]
#create raster brick
ras1 <- lapply(rasterFiles,raster)
ras2 <- lapply(ras1,crop,maskRaster)
ras2 <- lapply(ras2,raster::mask,maskRaster)
myBrick <- brick(ras2)
#the only way I can predict is by using L_model$fit, if I try just using
#L_model, I get an error asking for all the predictors entered into the
#model. When using the L_model$fit are the predictors then automatically
#preprocessed?
p <- predict(myBrick,L_model$fit)
writeRaster(p, "Prev_Ri2", datatype= 'FLT4S', format = "GTiff",
overwrite=TRUE)
生成的栅格看起来非常奇怪-当然可能只是因为模型不够好,但是我想确保我的方法正确无误-使用L_model $ fit是正确的吗?< / p>