R中使用经度和纬度的稀疏空间权重矩阵

时间:2015-07-24 02:26:40

标签: r spatial

快速概述:如果距离(i,j)低于\高于最大距离,我需要一个获取经度和纬度数据的函数并输出一个矩阵,其中Wij为反距离\零。它需要适用于大型数据集。

说明: 我有一个关于砍伐森林的40,000个观测数据集。我正在研究砍伐森林如何在空间上相关,因此想要创建一个空间权重矩阵。数据是不同区域的质心的经度和纬度。

我希望W = SpatWeight(长,纬度,最大值) 其中long和lat是坐标向量,max是距离标量,W是稀疏矩阵,其中Wij是点i和点j之间的反距离(Km)(如果i == j或dist(i,则Wij = 0) j)> max)。

举个例子,我希望输出SpatWeight(c(0,45,45,180),c(0,45,0,0),15000)

[1,] .            1.497133e-04 0.0001996178 .           
[2,] 0.0001497133 .            0.0001996178 7.485666e-05
[3,] 0.0001996178 1.996178e-04 .            .           
[4,] .            7.485666e-05 .            .

到目前为止我写的代码是:

  library(fields) #for rdist.earth
  SpatWeight <- function(long,lat,max) {
  W = rdist.earth(cbind(long,lat),miles=F) #Gives km distance matrix
  W[W > max] <- 0 #makes observations beyond max dist zero
  W <- ifelse(W!=0,1/W,W) #inverts non-zero distances
  for(i in 1:dim(W)[1]) {W[i,i] = 0} #because same observation is not always exactly zero, makes diagonal zero
  W <- Matrix(W,sparse=T) #Makes matrix sparse
  return(W)}

我的问题,除了没有写好的代码:-)之外,具有实际数据的矩阵对于计算机内存来说太大了。

任何有关如何解决此问题的帮助将不胜感激。 再次感谢! -Sean

1 个答案:

答案 0 :(得分:0)

我最终自己编写了一个在大型数据集上运行的代码。我发布在这里,以防任何任性的旅行者遇到我遇到的相同的问题。希望它可以帮助某人: - )

SpatialWeight <- function(SortedData,max,DistFunc = function(x){return(1/x)},rownormal=F){ 
  #Creates A spatial weights matrix for large data set.
  #
  #Args:
  # Inputs data: Data must be a data frame with columns Pos, latitude and longitude
  #   -Pos is a position vector from 1:nrow(Data)
  #   -latitude must be sorted from smallest to largest.
  # max: maximum distance in km for spatial correlation
  # DistFunc: function to calculate spatial weights. Defaults to inverse distance 1/dist(i,j)
  # rownormal: Whether the matrix should be normalized by row. Either TRUE or FALSE. Defaults to FALSE.
  #Returns:
  # Sparse Matrix of size=nrow(SortedData) of spatial weights.
  DegLat <- 40008 / 360 #Based on circumfrence by poles
  DegLong <- 40075/ 360 #Based on circumfrence by equator. If using data close to the pole may change number 
                        #to increase efficiency
  len = dim(SortedData)[1] #Lenght of Data
  lat = SortedData$latitude #latitude Data
  long = SortedData$longitude #Longitude Data
  M = Matrix(0,nrow=len,ncol=len,sparse=T) #Sparse Matrix 
  latmin = 1; latmax = 1 #Used for bounding rectangle
  for(r in 1:len) {#Main Loop
    LatMin = lat[r] - max/DegLat #Minimum latitude
    LatMax = lat[r] + max/DegLat #Max Lat
    LongMin = long[r] - max/DegLong #Min Long
    LongMax = long[r] + max/DegLong   #Max Long
    #The following code makes a subset of data which has a latitude within max length
    #This reduces the set we need to calculate distance with and therefore increases the speed
    #!!!!!!!!!!DATA MUST BE SORTED BY LATITUDE!!!!!!!!
    while(LatMin < lat[latmin]) { 
      if(latmin-1 == 0) {break}
      else {latmin = latmin-1}}
    while(LatMax >lat[latmax]) {
      if(latmax + 1 > len){break}
      else {latmax = latmax+1}}
    SubGroup = SortedData[latmin:latmax,]
    #Next we subset by longitude to get a bounding rectangle.
    SubGroup = SubGroup[which(SubGroup$longitude > LongMin & SubGroup$longitude < LongMax),] #Subsets by longitude
    W = distGeo(c(long[r],lat[r]),SubGroup[,c('longitude','latitude')])/1000 #Finally gets distance for subset
    W[W > max] <- 0 #Makes far distances zero
    W = ifelse(W!=0,DistFunc(W),W) #Applies distance function (default inverse distance)
    if(rownormal==T) {W=W/sum(W)}  #normalizes by row
    M[r,SubGroup$Pos] = W }        #Adds data to sparse Matrix
  return(M)}