如何对用于置换的for循环进行矢量化处理?

时间:2019-10-26 14:59:09

标签: r performance for-loop vectorization

我正在使用R进行分析,并想执行置换测试。为此,我使用了一个很慢的for循环,我想使代码尽可能快。我认为矢量化对此至关重要。但是,经过几天的尝试,我仍然没有找到合适的解决方案来重新编码它。非常感谢您的帮助!

我有一个人口与人口之间成对生态距离的对称矩阵("dist.mat")。我想随机调整此距离矩阵的行和列,以生成排列的距离矩阵("dist.mat.mix")。然后,我想将上三角值保存在此置换距离矩阵(大小为"nr.pairs"中)中。此过程应重复几次("nr.runs")。结果应该是一个矩阵("result"),其中包含多个行的置换上三角值,其尺寸分别为nrow=nr.runsncol=nr.pairs。下面的示例R代码正在使用for循环执行我想要的工作:

# example number of populations
nr.pops <- 20

# example distance matrix
dist.mat <- as.matrix(dist(matrix(rnorm(20), nr.pops, 5)))

# example number of runs
nr.runs <- 1000

# find number of unique pairwise distances in distance matrix
nr.pairs <- nr.pops*(nr.pops-1) / 2

# start loop
result <- matrix(NA, nr.runs, nr.pairs)
for (i in 1:nr.runs) {
  mix <- sample(nr.pops, replace=FALSE)
  dist.mat.mix <- dist.mat[mix, mix]
  result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
}

# inspect result
result

我已经使用base::replicate函数进行了一些笨拙的矢量化尝试,但这并不能加快速度。其实有点慢:

# my for loop approach
my.for.loop <- function() {
  result <- matrix(NA, nr.runs, nr.pairs)
  for (i in 1:nr.runs){
    mix <- sample(nr.pops, replace=FALSE)
    dist.mat.mix <- dist.mat[mix ,mix]
    result[i, ] <- dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
  }
}

# my replicate approach
my.replicate <- function() {
  results <- t(replicate(nr.runs, {
    mix <- sample(nr.pops, replace=FALSE)
    dist.mat.mix <- dist.mat[mix, mix]
    dist.mat.mix[upper.tri(dist.mat.mix, diag=FALSE)]
  }))
}

# compare speed
require(microbenchmark)
microbenchmark(my.for.loop(), my.replicate(), times=100L)

# Unit: milliseconds
# expr           min     lq      mean    median  uq      max       neval
# my.for.loop()  23.1792 24.4759 27.1274 25.5134 29.0666 61.5616   100
# my.replicate() 25.5293 27.4649 30.3495 30.2533 31.4267 68.6930   100    

非常感谢您的支持,以防您知道如何使用整洁的矢量化解决方案加快我的for循环。这有可能吗?

1 个答案:

答案 0 :(得分:1)

快一点:

Private Sub Worksheet_SelectionChange(ByVal Target As Range)

Sheet1.Unprotect 'Change according to your sheet's CodeName
With Target
    .Cells.Locked = True
    On Error Resume Next
    .Cells.SpecialCells(xlCellTypeBlanks).Locked = False
    On Error GoTo 0
End With
Sheet1.Protect

End Sub

更新: 我们可以以不同的方式获得必要的矩阵索引,因此可以一次子集元素:

minem <- function() {
  result <- matrix(NA, nr.runs, nr.pairs)
  ut <- upper.tri(matrix(NA, 4, 4)) # create upper triangular index matrix outside loop
  for (i in 1:nr.runs) {
    mix <- sample.int(nr.pops) # slightly faster sampling function
    result[i, ] <- dist.mat[mix, mix][ut]
  }
  result
}
microbenchmark(my.for.loop(), my.replicate(), minem(), times = 100L)
# Unit: microseconds
# expr               min      lq      mean   median       uq      max neval cld
# my.for.loop()   75.062  78.222  96.25288  80.1975 104.6915  249.284   100   a
# my.replicate() 118.519 122.667 152.25681 126.0250 165.1355  495.407   100   a
# minem()         45.432  48.000 104.23702  49.5800  52.9380 4848.986   100   a

Update2: 我们可以使用minem4 <- function() { n <- dim(dist.mat)[1] ut <- upper.tri(matrix(NA, n, n)) im <- matrix(1:n, n, n) p1 <- im[ut] p2 <- t(im)[ut] dm <- unlist(dist.mat) si <- replicate(nr.runs, sample.int(nr.pops)) p <- (si[p1, ] - 1L) * n + si[p2, ] result2 <- matrix(dm[p], nr.runs, nr.pairs, byrow = T) result2 } microbenchmark(my.for.loop(), minem(), minem4(), times = 100L) # Unit: milliseconds # expr min lq mean median uq max neval cld # my.for.loop() 13.797526 14.977970 19.14794 17.071401 23.161867 29.98952 100 b # minem() 8.366614 9.080490 11.82558 9.701725 15.748537 24.44325 100 a # minem4() 7.716343 8.169477 11.91422 8.723947 9.997626 208.90895 100 a 示例函数来提高速度:

dqrng