Question

我编写了以下代码来查找给定向量的所有排列：

perm <- function(v, r = NULL, P = NULL) {
  l <- length(v)
  if (l == 0) {
    P <- rbind(P, r)
    rownames(P) <- NULL
    P
  } else {
    for (i in 1:l) {
      new_r <- c(r, v[i])
      new_v <- v[-i]
      P <- perm(new_v, new_r, P)
    }
    P
  }
}

P <- perm(1:9) # takes "forever" yet e.g. perm(1:7) is quite fast!?!
P

它做了应有的工作，但问题是，如果使用长度大于8的向量（如上所述），它将永远运行。

我的问题
我真的没看到问题，我发现一些递归实现看起来没有什么不同，但是效率更高...那么有没有一种简单的方法可以优化代码，使其运行更快？

Answer 1

最好使用permGeneral中的RcppAlgos

P <- perm(1:5) # OP's function

library(RcppAlgos)
P1 <- permuteGeneral(5, 5)
all.equal(P, P1, check.attributes = FALSE)
#[1] TRUE

基准

序列稍长

system.time({

   P2 <- permuteGeneral(8, 8)
  })
#user  system elapsed 
#  0.001   0.000   0.001 

system.time({

   P20 <- perm(1:8) #OP's function
})
# user  system elapsed 
# 31.254  11.045  42.226 

all.equal(P2, P20, check.attributes = FALSE)
#[1] TRUE

通常，递归函数会花费更长的时间，因为对该函数的递归调用会花费更多的执行时间

Answer 2

如@akrun所述，R中的递归通常效率不高。但是，如果您必须具有递归解决方案，则不要比gtools::permutations多。这是实现：

permGtools <- function(n, r, v) {
    if (r == 1) 
        matrix(v, n, 1)
    else if (n == 1) 
        matrix(v, 1, r)
    else {
        X <- NULL
        for (i in 1:n) X <- rbind(X, cbind(v[i], permGtools(n - 1, r - 1, v[-i])))
        X
    }
}

要获取完整的源代码，只需在控制台中键入gtools::permutations并按Enter。有关更多信息，请参见How can I view the source code for a function?

以下是一些时间安排：

system.time(perm(1:8))
  user  system elapsed 
34.074  10.641  44.815

system.time(permGtools(8,8,1:8))
 user  system elapsed 
0.253   0.001   0.255

出于很好的考虑：

system.time(permGtools(9, 9, 1:9))
 user  system elapsed 
2.512   0.046   2.567

为什么OP的执行速度较慢？

如果不阅读详细信息，请跳至摘要。

对于初学者来说，我们可以简单地看到OP的实现比gtools中的实现进行了更多的递归调用。为了说明这一点，我们将count <<- count + 1L添加到每个函数的顶部（注：我们正在使用<<-赋值运算符，它首先搜索父环境）。例如：

permGtoolsCount <- function(n, r, v) {
    count <<- count + 1L
    if (r == 1)
    .
    .

现在我们测试一些长度：

iterationsOP <- sapply(4:7, function(x) {
    count <<- 0L
    temp <- permCount(1:x)
    count
})

iterationsOP
[1]    65   326  1957 13700

iterationsGtools <- sapply(4:7, function(x) {
    count <<- 0L
    temp <- permGtoolsCount(x, x, 1:x)
    count
})

iterationsGtools
[1]   41  206 1237 8660

如您所见，OP的实现在每种情况下都会进行更多的调用。实际上，它的数量约为递归调用量的1.58...倍。

iterationsOP / iterationsGtools
[1] 1.585366 1.582524 1.582053 1.581986

正如我们已经说过的，R中的递归声誉很差。除了R does not employ tail-recursion之外，我找不到其他确切的原因。

在这一点上，似乎很难相信，进行约1.58倍的递归调用可以解释我们上面看到的175倍的速度（即44.815 / 0.255 ~= 175）。

我们可以使用Rprof来分析代码，以收集更多信息：

Rprof("perm.out", memory.profiling = TRUE)
a1 <- perm(1:8)
Rprof(NULL)
summaryRprof("perm.out", memory = "both")$by.total
             total.time total.pct mem.total self.time self.pct
"perm"            43.42    100.00   15172.1      0.58     1.34
"rbind"           22.50     51.82    7513.7     22.50    51.82
"rownames<-"      20.32     46.80    7388.7     20.30    46.75
"c"                0.02      0.05      23.7      0.02     0.05
"length"           0.02      0.05       0.0      0.02     0.05



Rprof("permGtools.out", memory.profiling = TRUE)
a2 <- permGtools(8, 8, 1:8)
Rprof(NULL)
summaryRprof("permGtools.out", memory = "tseries")$by.total
             total.time total.pct mem.total self.time self.pct
"rbind"            0.34    100.00     134.8      0.18    52.94
"cbind"            0.34    100.00     134.8      0.08    23.53
"permGtools"       0.34    100.00     134.8      0.06    17.65
"matrix"           0.02      5.88       0.0      0.02     5.88

立即（而不是时间）跳出来的一件事是OP的实现占用大量内存。 OP的实现大约使用 15 Gb 的内存，而gtools实现仅使用 134 Mb 。

深入挖掘

在上面，我们通过将memory参数设置为both来简单地查看一般视图中的内存使用情况。还有一个名为tseries的设置，可让您查看一段时间内的内存使用情况。

head(summaryRprof("perm.out", memory = "tseries"))
     vsize.small vsize.large    nodes duplications       stack:2
0.02     4050448    25558992 49908432         2048 "perm":"perm"
0.04       98808    15220400  1873760          780 "perm":"perm"
0.06       61832    12024184  1173256          489 "perm":"perm"
0.08       45400           0   861728          358 "perm":"perm"
0.1            0    14253568        0          495 "perm":"perm"
0.12       75752    21412320  1436120          599 "perm":"perm"

head(summaryRprof("permGtools.out", memory = "tseries"))
     vsize.small vsize.large    nodes duplications              stack:2
0.02     4685464    39860824 43891512            0 "permGtools":"rbind"
0.04      542080      552384 12520256            0 "permGtools":"rbind"
0.06           0           0        0            0 "permGtools":"rbind"
0.08      767992     1200864 17740912            0 "permGtools":"rbind"
0.1       500208      566592 11561312            0 "permGtools":"rbind"
0.12           0      151488        0            0 "permGtools":"rbind"

这里有很多事情要做，但是要重点关注的是duplications字段。在summaryRprof的文档中，我们有：

它还记录在该时间间隔内对内部函数重复项的调用次数。需要复制参数时，C代码会调用重复项。

比较每个实现中的副本数：

sum(summaryRprof("perm.out", memory = "tseries")$duplications)
[1] 121006

sum(summaryRprof("permGtools.out", memory = "tseries")$duplications)
[1] 0

因此，我们看到OP的实现需要制作许多副本。考虑到所需对象是函数原型中的参数，我想这不足为奇。也就是说，P是要返回的排列矩阵，并且每次迭代都会越来越大。每次迭代时，我们会将其传递给perm。您会在gtools实现中注意到，事实并非如此，因为它只是两个数字值和一个参数向量。

摘要

因此，有了OP的原始实现，它不仅进行了更多的递归调用，还需要许多副本，这些副本又使内存陷入困境，从而极大地降低了效率。

如何优化递归函数以查找所有排列？

2 个答案:

基准

为什么OP的执行速度较慢？

深入挖掘

摘要