如何在data.table中调用gDistance?

时间:2016-03-01 15:44:38

标签: r data.table gis rgdal

我使用了ddply和gDistance(package:rgeos)来链接两组多边形。然而,对于我的大型数据集来说,这个过程非常缓慢,正如许多博客中所解释的那样(例如Why is plyr so slow?)。

这些博客建议使用data.table会更快,但我无法弄清楚如何让它适用于我的情况。特别是,如何在data.table ......中仅对我的栖息地清除数据的子集应用gDistance?

下面我将更详细地解释我要做的事情,并附上我的一小部分数据用于测试代码(此处:https://www.dropbox.com/sh/9onaeltb81qrd7h/AAAdED0KS6n2EzP74F3o81Lxa?dl=0)。

我有两个包含多边形的形状文件(其中一个栖息地间隙和其他属性因为清理栖息地而受到惩罚),其中我想识别第一层(栖息地间隙)中的多边形,该多边形最接近第二层中的每个多边形图层(惩罚属性),因为两者并不总是完全重叠。另外,还有一个额外的限制;匹配的栖息地清除多边形不得超过环境犯罪5年。

我已成功使用以下ddply代码 - 有关如何使用data.table执行此操作的任何建议,我认为会更快...?

谢谢!

# # # # # # # # # # # # # # # # # # # Load [R] GIS packages. # # # # # # # # # # # # # # # # # # 
library(rgeos)
library(raster)
library(rgdal)
library(plyr)

# # # # # # # # # # # # # # # # # # # ENTER shapefile information HERE # # # # # # # # # # # # # # # # # # # # # #


# What is the name of the punishments shapefile?
punishments <- "punishments_stack_overflow"  


# What is the name of the PUNISHMENTs directory?
myDrctry <- "E:/Esri_arcGIS_datasets/SM_data/IBAMA_embargo/final_embargo_list/near_chopped/stack_overflow" # !CHANGE ME - where the shapefiles are stored


# What is the name of the hab_cl shapefile?
hab_cl_shp <- "RO_SimU_deforestation_Amazonia_SIRGAS_near" 


# What is the name of the hab_cl data directory?
my_hab_cl_Drctry <- "E:/Esri_arcGIS_datasets/SM_data/PRODES/Deforestation_per_SimU/near_analysis" #! CHANGE ME




# # # # # # # # # # # # # # # # # # # # Load the shapefiles  # # # # # # # # # # # # # # # # # # # # # # 


# Read in the embargo shapefile
punishments_need_near <- readOGR(dsn=myDrctry, layer=punishments)


# Identify the attributes to keep
myattributes <- c("numero_tad", "data_tad", "CD_BIOMA")


# Subset the full dataset extracting only the desired attributes
punishments_need_near@data <- punishments_need_near@data[,myattributes]


# Load the deforestation data 
hab_cl_patches_near <- readOGR(dsn=my_hab_cl_Drctry, layer=hab_cl_shp) 
proj4string(hab_cl_patches_near)               # check the projection (which is SIRGAS 2000 UTM)
#hab_cl_patches_near@data <- hab_cl_patches_near@data[,c("year","LAPIG_ID")] # manipulate the columns to match oter dataframes
#names(hab_cl_patches_near@data) <- c("ano", "PRODES_ID")
head(hab_cl_patches_near)                                                   # check that it worked 


# # # # # # # # # # # # # # # # # # # # # # # Run the loop # # # # # # # # # # # # # # # # # # # # # # # 


# Use ddply to calculate nearest distance for each embargo ("numero_tad")
tmp <- ddply(punishments_need_near@data, .(numero_tad), function(x) {    # numero_tad is a unique identifier per punishment
  ID <- x$numero_tad[1]
  tmp.punishments <- punishments_need_near[punishments_need_near@data$numero_tad == ID,]                    
  tmp.patches <- hab_cl_patches_near[(hab_cl_patches_near@data$ano +5) >= tmp.punishments@data[,"ano_new"] &     # match the punishments with habitat clearance in last 5 years (ano_new = yr of punishment, ano = yr of habitat clearance)
                                     hab_cl_patches_near@data$ano <= tmp.punishments@data[,"ano_new"],]          # and not after the punishment itself
  obj <- gDistance(tmp.punishments, tmp.patches, byid=TRUE)                                                      # calculate the distance between each punishment and patch of habitat clearance
  df  <- data.frame(numero_tad = ID, PRODES_ID = tmp.patches$PRODES_ID[which.min(obj)], dist = min(obj))    # link punishment with the nearest suitable patch of habitat clearance
}, .progress='text') # progress bar

head(tmp)

1 个答案:

答案 0 :(得分:0)

您的多边形存储在Spatial Polygons Data Frame类对象中。 data.table仅适用于data.table类型对象。您需要将data.table创建的数据子集转换为空间多边形数据框。

gDistance需要Spatial对象,但它正在接收数据表对象。

要解决此问题,请将与多边形对应的列转换回聚合函数中的多边形。

您的Dropbox链接已损坏,但这是使用空间点(而非多边形)的示例。我假设采用相同的方法。

library(sp)
library(rgeos)

punishments <- data.frame(numero_tad = sample(1:3, 10, replace=TRUE), ano_new = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10) )
patches <- data.frame(PRODES_ID = 1:10, ano = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10) )
head(punishments)
head(patches)

coordinates(punishments) <- ~x+y # convert to Spatial object
coordinates(patches) <- ~x+y # convert to Spatial object

class(punishments) # "SpatialPointsDataFrame"
head(punishments)

link <- function(SD) {
  coordinates(SD) <- ~x+y # convert from data.table to Spatial object
  # you'll need to do something like to above to convert back to a polygon
  tmp.patches <- patches[(patches$ano + 5) >= SD$ano_new & patches$ano < SD$ano_new, ]
  obj <- gDistance(SD, tmp.patches, byid=TRUE)
  df <- list(PRODES_ID = tmp.patches$PRODES_ID[which(obj == min(obj), arr.ind=TRUE)[1]], dist = min(obj) )
  return(df)
}

dt <- as.data.table(punishments)
class(dt)
head(dt)
setkey(dt, numero_tad)

tmp <- dt[,link(.SD), by=numero_tad]

希望有所帮助!