快速概述:如果距离(i,j)低于\高于最大距离,我需要一个获取经度和纬度数据的函数并输出一个矩阵,其中Wij为反距离\零。它需要适用于大型数据集。
说明: 我有一个关于砍伐森林的40,000个观测数据集。我正在研究砍伐森林如何在空间上相关,因此想要创建一个空间权重矩阵。数据是不同区域的质心的经度和纬度。
我希望W = SpatWeight(长,纬度,最大值) 其中long和lat是坐标向量,max是距离标量,W是稀疏矩阵,其中Wij是点i和点j之间的反距离(Km)(如果i == j或dist(i,则Wij = 0) j)> max)。
举个例子,我希望输出SpatWeight(c(0,45,45,180),c(0,45,0,0),15000)
[1,] . 1.497133e-04 0.0001996178 .
[2,] 0.0001497133 . 0.0001996178 7.485666e-05
[3,] 0.0001996178 1.996178e-04 . .
[4,] . 7.485666e-05 . .
到目前为止我写的代码是:
library(fields) #for rdist.earth
SpatWeight <- function(long,lat,max) {
W = rdist.earth(cbind(long,lat),miles=F) #Gives km distance matrix
W[W > max] <- 0 #makes observations beyond max dist zero
W <- ifelse(W!=0,1/W,W) #inverts non-zero distances
for(i in 1:dim(W)[1]) {W[i,i] = 0} #because same observation is not always exactly zero, makes diagonal zero
W <- Matrix(W,sparse=T) #Makes matrix sparse
return(W)}
我的问题,除了没有写好的代码:-)之外,具有实际数据的矩阵对于计算机内存来说太大了。
任何有关如何解决此问题的帮助将不胜感激。 再次感谢! -Sean
答案 0 :(得分:0)
我最终自己编写了一个在大型数据集上运行的代码。我发布在这里,以防任何任性的旅行者遇到我遇到的相同的问题。希望它可以帮助某人: - )
SpatialWeight <- function(SortedData,max,DistFunc = function(x){return(1/x)},rownormal=F){
#Creates A spatial weights matrix for large data set.
#
#Args:
# Inputs data: Data must be a data frame with columns Pos, latitude and longitude
# -Pos is a position vector from 1:nrow(Data)
# -latitude must be sorted from smallest to largest.
# max: maximum distance in km for spatial correlation
# DistFunc: function to calculate spatial weights. Defaults to inverse distance 1/dist(i,j)
# rownormal: Whether the matrix should be normalized by row. Either TRUE or FALSE. Defaults to FALSE.
#Returns:
# Sparse Matrix of size=nrow(SortedData) of spatial weights.
DegLat <- 40008 / 360 #Based on circumfrence by poles
DegLong <- 40075/ 360 #Based on circumfrence by equator. If using data close to the pole may change number
#to increase efficiency
len = dim(SortedData)[1] #Lenght of Data
lat = SortedData$latitude #latitude Data
long = SortedData$longitude #Longitude Data
M = Matrix(0,nrow=len,ncol=len,sparse=T) #Sparse Matrix
latmin = 1; latmax = 1 #Used for bounding rectangle
for(r in 1:len) {#Main Loop
LatMin = lat[r] - max/DegLat #Minimum latitude
LatMax = lat[r] + max/DegLat #Max Lat
LongMin = long[r] - max/DegLong #Min Long
LongMax = long[r] + max/DegLong #Max Long
#The following code makes a subset of data which has a latitude within max length
#This reduces the set we need to calculate distance with and therefore increases the speed
#!!!!!!!!!!DATA MUST BE SORTED BY LATITUDE!!!!!!!!
while(LatMin < lat[latmin]) {
if(latmin-1 == 0) {break}
else {latmin = latmin-1}}
while(LatMax >lat[latmax]) {
if(latmax + 1 > len){break}
else {latmax = latmax+1}}
SubGroup = SortedData[latmin:latmax,]
#Next we subset by longitude to get a bounding rectangle.
SubGroup = SubGroup[which(SubGroup$longitude > LongMin & SubGroup$longitude < LongMax),] #Subsets by longitude
W = distGeo(c(long[r],lat[r]),SubGroup[,c('longitude','latitude')])/1000 #Finally gets distance for subset
W[W > max] <- 0 #Makes far distances zero
W = ifelse(W!=0,DistFunc(W),W) #Applies distance function (default inverse distance)
if(rownormal==T) {W=W/sum(W)} #normalizes by row
M[r,SubGroup$Pos] = W } #Adds data to sparse Matrix
return(M)}