Question

以下是我正在处理的代码的简化版本：

# Randomly populated matrices as example/toy data.

adata <- sample(1:25,2500,replace=T)
xdim <- 50
ydim <- 50
dim(adata) <- c(xdim,ydim)

# All possible categories
cats <- 1:25

# Retrieve the coordinates of the members of each class
# These subsets are used to calculate the minimum distances.

subA <- list(NULL)

for(i in 1:length(cats)){
  subA[[i]] <- which(adata==cats[i],arr.ind=T)
}


# Matrix containing Euclidean distances for each combination of dX and dY
eucdist <- matrix(nrow=xdim,ncol=ydim)
eucdist <- sqrt(((row(eucdist)-1)^2)+((col(eucdist)-1)^2))

# Create an array containing the x,y coordinates of each cell (in pixels). Used to  calculate distances

xycoor <- array(dim=c(xdim,ydim,2))
xycoor[,,1] <- row(xycoor[,,1])
xycoor[,,2] <- col(xycoor[,,2])


# For each cell, find the minimum distance to a cell belonging to each category

mindistA <- array(dim=c(xdim,ydim,length(cats)))

for(i in 1:length(cats)){
  # Calculates an array of the distance between each cell and the nearest cell of each class (of the same map)

  mindistA[,,i] <- apply(xycoor,1:2,function(x,y,z) min(eucdist[abs(sweep(subA[[i]],2,x,"-"))+1],na.rm=T))
}

这样做的目的是：

为list中包含的每个类创建一个cats，其中包含与该类对应的单元格的行号和列号
对于我的矩阵的每个单元格，计算到每个类别对应的最近单元格的距离。

这适用于我的玩具数据，但我目前正在使用更大的数据集运行（我的矩阵大约有1000 * 2000个元素，而不是像这里的例子中的50 * 50）并且它非常慢并且很可能需要几天才能完成。基于this question，我怀疑尝试使用表达式作为索引来访问eucdist的元素会降低执行速度。

一种可能的解决方案是1）创建一个维度length(subA[[i]]) * 50 * 50 * 2的数组，其中包含每个网格单元格和每个单元格对应的X和Y坐标之间的距离。 class i，和2）使用此数组的内容在'eucdist'中查找所需的值（可能使用rownames和colnames，如链接问题的评论中所示以上）。但是，我担心在我的情况下，这将生成太大而无法处理的数组。

有没有人建议加快这个速度？任何建议将不胜感激！

谢谢！

PS：我之前尝试使用apply包中的rdist直接从fields函数中计算欧氏距离，但这也非常慢，这也是我决定使用的原因使用欧几里德距离的查找表。

R：使用表达式作为下标的慢速矩阵索引

0 个答案: