目前看起来似乎无法预测到多项gbm
模型的栅格砖。然而,对于相对较小的栅格网格,可以轻松地解决这个问题 - 下面将对此进行说明。但是当你处理大型栅格,许多类(在我的情况下是植被群落)和预测变量时,这里的过程非常缓慢并且没有挑战。我希望以下信息对任何遇到同样挑战的人都有用。
下面我尝试使用多项式gbm模型和20个预测变量来预测36个植被群落的发生概率。我的研究区域是一个30x30米的栅格网格,有213,000,000像素 - 但是下面的代码与我用来开发/测试过程的1221个单元格相关。
> require (gbm)
> require (raster)
> require (rgdal)
> load("gbmmodel_p20.Rda")
> print(gbmmodel)
gbm(formula = as.formula(Nclustal_1 ~ tcd_coast_disa_f + tce_raddq_f +
tce_radwq_f + tct_temp_minwin_f + tct_tempdq_f + tcw_clim_etaaann_f +
tcw_precipseas_f + tcw_precipwq_f + tcw_rain1mm_f + tdd_strmdstge6_i +
tlf_logre10_f + tlf_rough0500_f + trs_land_pfc_2008 + trs88_sspr_g_50p +
trs88_ssum_b_50p + trs88_ssum_d_50p + tsp_bd200_f + tsp_cly200a_f +
tsp_ph200_f + tsp_tn060a_f), distribution = "multinomial",
data = gbmdata, n.trees = 2500, interaction.depth = 2, n.minobsinnode = 3,
shrinkage = 0.003, bag.fraction = 0.75, train.fraction = 1,
cv.folds = 8, keep.data = TRUE, verbose = TRUE, class.stratify.cv = TRUE,
n.cores = 8)
A gradient boosted model with multinomial loss function.2500 iterations were performed.
The best cross-validation iteration was 2500.
There were 20 predictors of which 20 had non-zero influence.
我将预测变量堆叠到栅格堆栈中,如下所示:
> img.files <- list.files("/mnt/scratch/mcilwea/R/TSG/inmodel20_test",
pattern='\\.img$', full.names=TRUE)
> rasStack <- stack(img.files)
> NAvalue(rasStack) <- -9999
> projection(rasStack)
"+proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs"
检查rasStack中的名称是否与上面模型中的名称相同很重要
> names(rasStack)
[1] "tcd_coast_disa_f" "tce_raddq_f" "tce_radwq_f"
[4] "tct_tempdq_f" "tct_temp_minwin_f" "tcw_clim_etaaann_f"
[7] "tcw_precipseas_f" "tcw_precipwq_f" "tcw_rain1mm_f"
[10] "tdd_strmdstge6_i" "tlf_logre10_f" "tlf_rough0500_f"
[13] "trs88_sspr_g_50p" "trs88_ssum_b_50p" "trs88_ssum_d_50p"
[16] "trs_land_pfc_2008" "tsp_bd200_f" "tsp_cly200a_f"
[19] "tsp_ph200_f" "tsp_tn060a_f"
在运行predict.gbm之前,我调用了最好的迭代模型
> best.iter <- gbm.perf(gbmmodel, method = "cv", plot.it = TRUE)
我可以通过将网格单元格转换为一组空间点(如下所示)为我的测试区域创建一组光栅输出图像,其中包含1221个单元格。
points<-raster(img.files[1])
points.df <- as.data.frame(rasterToPoints(points))
coordinates(points.df) <- ~x+y
plot(points.df)
coords <- coordinates(points.df)
rasterOut <- extract(rasStack, coords)
outTable<- as.data.frame(cbind(coords, rasterOut))
outTable[1:1,1:22]
x y tcd_coast_disa_f tce_raddq_f tce_radwq_f tct_temp_minwin_f tct_tempdq_f
149.1269 -35.6457 1.052329 10.82778 23.63533 -0.9852222 5.928154
tcw_clim_etaaann_f tcw_precipseas_f tcw_precipwq_f tcw_rain1mm_f tdd_strmdstge6_i tlf_logre10_f
600 13.93321 179.9841 80.2064 491 1.945529
tlf_rough0500_f trs_land_pfc_2008 trs88_sspr_g_50p trs88_ssum_b_50p trs88_ssum_d_50p tsp_bd200_f
15.6701 0 0.38 0.09000003 0.55 1.590021
tsp_cly200a_f tsp_ph200_f tsp_tn060a_f
33.33834 5.648166 0.03193555
运行predict.gbm模型
predtable <- as.data.frame(predict.gbm(gbmmodel, outTable, n.trees=best.iter, type="response"))
predout <- cbind(coords,predtable)
predout[1:1,1:38]
x y e24.2500 e26.2500 e59.2500 g152.2500 g157.2500 g94.2500 m31.2500
149.1269 -35.6457 0.001286283 0.0006473167 0.002043077 0.4973372 8.686316e-05 0.0006710651 0.01067058
m36.2500 m68.2500 MU11.2500 MU45.2500 OTHER.2500 p14.2500 p15.2500 p17.2500
0.004314056 0.007128109 0.0005012718 0.0006254022 0.1727706 0.1411112 0.0009099294 0.0002520156
p19.2500 p20.2500 p22.2500 p220.2500 p23.2500 p24.2500 p27.2500 p338.2500
0.003205936 0.002534798 0.0001474091 0.001214219 0.008455798 0.01701965 0.001879607 0.002238932
p420.2500 p520.2500 p54.2500 p9.2500 u118.2500 u179.2500 u21.2500 u22.2500
0.001456685 0.00108458 0.0003695966 0.02501649 0.0005977814 0.01711885 0.0558054 0.002357498
u23.2500 u27.2500 u28.2500 u78.2500 Unit5.2500
0.00040357 0.001422519 0.0002764237 0.01699094 4.835942e-05
write.csv(predout, "Predout.csv", row.names=TRUE)
我可以通过以下方式将发生概率值从可预测值写入一组36个新的光栅图像:
names <- names(predtable)
for (i in 1:length(names)) {
SpatialPointspredTable <- SpatialPointsDataFrame (coords=coords, data=predtable[i])
gridded(SpatialPointspredTable)=TRUE
rasValues <- raster(SpatialPointspredTable)
projection(rasValues) <- "+proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs"
plot(rasValues)
writeRaster(rasValues, filename=names[i], format="HFA", overwrite=TRUE)
}
这给了我想要的输出 - 但是 - 而不是必须预测数据帧 - 如果可以直接预测到栅格砖,则该过程会更快更有效。
如果我跑
predict(rasStack,
gbmmodel,
n.trees=best.iter,
filename="multiclass_BRT_20p_test_idrisi",
format="IDRISI",
na.rm=FALSE,
type="response",
overwrite=TRUE,
progress="text",
cores=8)
输出是栅格网格,代表我想要预测的第一个植被群落:
|=========================================================| 100%
class : RasterLayer
dimensions : 33, 37, 1221 (nrow, ncol, ncell)
resolution : 0.0002777778, 0.0002777778 (x, y)
extent : 149.1268, 149.1371, -35.65473, -35.64556 (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : /mnt/scratch/mcilwea/R/TSG/multiclass_BRT_20p_test_idrisi.rdc
names : layer
values : 3.762369e-06, 0.9337785 (min, max)
IDRISI文件格式不支持多波段图像,因此我无法将index = 1:36添加到混合中以生成多波段栅格砖作为输出。如果我尝试这样做 - 设置格式=&#34; GTiff&#34;或&#34; HFA&#34; (或任何其他需要rgdal的格式)我收到错误信息:
&#34; rgdal :: putRasterData中的错误(x @ file @ transient,v,band = 1,offset = off):光栅IO期间失败&#34;
如果我设置格式=&#34; raster&#34;我可以得到一个rasterbrick输出,但这不会让我读/写除idrisi图像之外的任何数据(预测的第一个输出列) .gbm模型)
&#34;警告讯息:
在.rasterFromRasterFile中(grdfile,band = band,objecttype,...):
值文件的大小与单元格数量(给定数据类型)不匹配&#34;
predrast <- predict(object=rasStack,
model=gbmmodel,
n.trees=best.iter,
filename="multi_test",
fun=predict.gbm,
format="raster",
index=1:5,
bandorder="BIL",
ext=extent(rasStack[[1:20]]),
na.rm=FALSE,
type="response",
datatype="FLT4S",
overwrite=TRUE,
progress="text",
cores=8)
|=====================================================================100%
predrast
class : RasterBrick
dimensions : 33, 37, 1221, 5 (nrow, ncol, ncell, nlayers)
resolution : 0.0002777778, 0.0002777778 (x, y)
extent : 149.1268, 149.1371, -35.65473, -35.64556 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs
data source : C:\Data\FINAL_TSG\test\multi_test.grd
names : layer.1, layer.2, layer.3, layer.4, layer.5
min values : 3.762369e-06, 3.762369e-06, 3.762369e-06, 3.762369e-06, 3.762369e-06
max values : 0.9337785, 0.9337785, 0.9337785, 0.9337785, 0.9337785
如果我尝试将上面的栅格砖转换为一组单独的光栅图像
writeRaster(predrast, filename="multi_test.img", format="HFA", bylayer=TRUE, suffix="numbers", overwrite=TRUE)
没有任何图像有任何意义。
这也有点令人费解,如果我尝试编写多频段CDF图像,我会得到一组不同的警告信息给rgdal错误:
| 0%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart), :
put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
|======== | 25%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart), :
put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
|================== | 50%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart), :
put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
|=============================================== | 75%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart), :
put.var.ncdf: error: you asked to write 7992 values, but the passed data array only has 7955 entries!
|=============================================================| 100%
在这里,我不确定发生了什么?
如果知道如何与gbm软件包的作者合作,可以直接预测到栅格砖,而不会遇到上述任何问题,那就太棒了。
如果有人想知道我在完整光栅网格上使用的代码,请在下方留言,我很乐意提供。
欢呼声 阿伦
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C LC_TIME=English_Australia.1252
attached base packages:
[1] parallel splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] ncdf_1.6.8 rgdal_0.9-1 gbm_2.1 lattice_0.20-30 survival_2.37-7 raster_2.3-24 sp_1.0-17
loaded via a namespace (and not attached):
[1] grid_3.1.2 tools_3.1.2
# Traceback error for
Error in rgdal::putRasterData(x@file@transient, v, band = 1, offset = off) :
Failure during raster IO
> traceback()
7: .Call("RGDAL_PutRasterData", raster, rasterData, as.integer(offset),
PACKAGE = "rgdal")
6: rgdal::putRasterData(x@file@transient, v, band = 1, offset = off)
5: writeValues(predrast, predv, tr$row[i])
4: writeValues(predrast, predv, tr$row[i])
3: .local(object, ...)
2: predict(object = rasStack, model = gbmmodel, n.trees = best.iter,
filename = "multi_img", format = "HFA", na.rm = FALSE, type = "response",
datatype = "FLT4S", overwrite = TRUE, progress = "text")
1: predict(object = rasStack, model = gbmmodel, n.trees = best.iter,
filename = "multi_img", format = "HFA", na.rm = FALSE, type = "response",
datatype = "FLT4S", overwrite = TRUE, progress = "text")