Question

我想在公司之间使用他们的地理位置创建距离矩阵。

我有一个方形距离矩阵，包含98个意大利省份之间的距离。我还有一个包含两列的数据框。一列有8376家公司的ID号。另一栏显示了这些公司中的每一个所在的98个省中的哪个省。

我想创建一个8376乘8376距离矩阵，其中包含所有公司之间的距离。我写的代码（下面）非常低效。反正有没有更快地做到这一点？我问，因为我需要多个数据集。

这就是数据框的样子

   cid province
1  61       TO
2 102       TO
3 123       AT
4 127       TO
5 158       TO
6 225       NO
7 232       TO
8 388       TO

这是方形距离矩阵的样子

     CH     AQ      PE       TE
1     0  64.39   41.74    81.18
2 64.39   0      40.38    61.05
3 41.74  40.38    0       40.79
4 81.18  61.05   40.79     0              


outcome = matrix(NA,8376,8376)  # empty matrix

for(i in 1:8376){
  for(j in (i+1):8376){
    x=which(dist.codes[,1]==companyID_Province[i,2]) # Find the row index in the distance matrix 
    y=which(dist.codes[1,]==companyID_Province[j,2]) # Find the column index in the distance matrix
    outcome[i,j] = dist.codes[x,y] # Specify the distance to the corresponding element in outcome matrix
  }
}

Answer 1

如果dist.codes是各省的距离矩阵，province[i]是ID为i的公司所在省，则dist.codes[province,province]是公司的距离矩阵。如果company是公司ID位于company$ID且省份位于company$province的数据框，则company$province[order(company$ID)]是上面的向量province，按公司排序ID's。

我已将您的代码与我的提案进行比较：

SpeedComparison <- function(N,M)
{
  set.seed(1)

  dist.codes <- matrix(sample(1:1000,N*N,rep=TRUE),N,N) / 100
  dist.codes <- dist.codes * t(dist.codes)
  diag(dist.codes) <- 0
  dist.codes <- cbind(0:N,rbind(1:N,dist.codes)) # Add an additional row and an additional column with province numbers.

  companyID_Province <- data.frame( ID = 1:M, province = sample(1:N,M,replace=TRUE) )

  #---------------------------------------------------------------------

  tm.1 <- 0.01 * system.time(
    for ( i in 1:100)
    {
      outcome.1 = matrix(0,M,M)  # empty matrix

      for(i in 1:(M-1)){
        x=which(dist.codes[,1]==companyID_Province[i,2]) # Find the row index in the distance matrix 
        for(j in (i+1):M){
          y=which(dist.codes[1,]==companyID_Province[j,2]) # Find the column index in the distance matrix
          outcome.1[i,j] = dist.codes[x,y] # Specify the distance to the corresponding element in outcome matrix
        }
      }
    }
  )

  tm.2 <- 0.01 * system.time(
    for ( i in 1:100)
    {
      D <- dist.codes[-1,][,-1] # The additional row/column is not used here.
      outcome.2 <- D[companyID_Province[,2],companyID_Province[,2]]
    }
  )

  list( outcome = list( outcome.1+t(outcome.1), outcome.2 ),
        time    = list( tm.1, tm.2 ) )
}

#======================================================================

N <- 50

Comparison <- as.data.frame(matrix(NA,0,4))

for ( M in c(100,150,200,250,300) )
{
  Test <- SpeedComparison(N,M)

  Comparison <- rbind( Comparison,
                       c( M,
                          Test$time[[1]][3],
                          Test$time[[2]][3],
                          identical(Test$outcome[[1]],Test$outcome[[2]])))
}  

names(Comparison) <- c("M","time.1","time.2","outcomes.identical")

outcome s是相等的（“1”表示为TRUE），时间是相等的：

> Comparison
    M time.1 time.2 outcomes.identical
1 100 0.2568  2e-04                  1
2 150 0.5661  5e-04                  1
3 200 1.1845  7e-04                  1
4 250 1.9568  1e-03                  1
5 300 2.8602  4e-03                  1
>

根据公司的地理位置在公司之间创建距离矩阵

1 个答案: