Question

我想编写一个R代码，以有效的方式生成具有重复字符的列表的所有不同排列。例如，

x<-c(1,1,2,2,3,4);    
library(combinat);
unique(permn(x))

有效，但是如果向量x的长度稍微长一点，则效率非常低并且不起作用。有人知道如何以有效的方式生成上述序列的唯一排列吗？

Answer 1

排列是笨拙的野兽。从一组 n 中选择 r 对象时获得的排列数为

perm

当选择所有这些时，意味着 r = n ，这将减少为

permall

对于一组6个值，这只是720，这并不令人印象深刻，但只要看看这些数字在增加集合大小时的爆炸速度有多快：

data.frame(n=1:12,P=factorial(1:12));
##     n         P
## 1   1         1
## 2   2         2
## 3   3         6
## 4   4        24
## 5   5       120
## 6   6       720
## 7   7      5040
## 8   8     40320
## 9   9    362880
## 10 10   3628800
## 11 11  39916800
## 12 12 479001600

我怀疑你是否有兴趣建立一个包含479,001,600个组件的列表！

现在，在您确切的问题中，您确实拥有名为multiset的内容，这意味着您正在处理multiset permutations，这会减少排列的总数。你可能认为这会让你免于笨拙。好吧，再想一想！

多集排列的公式是

permmulti

从您的示例向量x开始，我们有180（factorial(6)/factorial(2)^2），这听起来很糟糕，但让我们尝试将对添加到您的向量中，看看会发生什么：

data.frame(m=paste('2 singles and',2:7,'pairs'),P=factorial(2+2*2:7)/factorial(2)^(2:7));
##                       m            P
## 1 2 singles and 2 pairs          180
## 2 2 singles and 3 pairs         5040
## 3 2 singles and 4 pairs       226800
## 4 2 singles and 5 pairs     14968800
## 5 2 singles and 6 pairs   1362160800
## 6 2 singles and 7 pairs 163459296000

我决定停在7对，因为除此之外，R开始做科学记谱法，这看起来很烦人。

几对并添加4组怎么样？我们来试试吧。

data.frame(m=paste('2 pairs and',1:3,'quartets'),P=factorial(2*2+4*1:3)/(factorial(2)^2*factorial(4)^(1:3)));
##                        m         P
## 1 2 pairs and 1 quartets       420
## 2 2 pairs and 2 quartets    207900
## 3 2 pairs and 3 quartets 378378000

这次我不得不停在3以避免科学记数法。

我尝试使用上述所有内容的观点是，您不能超出当前向量x，并希望使用任何算法;数字太大了。

尽管如上所述，我确实试图为您的问题找到一个解决方案，这对于combinat::permn()范围太大的范围很小的多字符集非常有用，但是它们不是太大而不能完全任何计算机系统都无法处理。我想出了以下递归函数（加上包装函数）：

gpermuteImpl <- function(uf) do.call(rbind,lapply(1:nrow(uf),function(r) { u <- uf$u[r]; if (uf$f[r] == 1L) if (nrow(uf) == 1L) return(u) else uf <- uf[-r,] else uf$f[r] <- uf$f[r]-1L; return(cbind(u,gpermuteImpl(uf))); }));
gpermute <- function(x) unname(gpermuteImpl(data.frame(u=unique(x),f=tabulate(x))));

这实际上会生成矩阵而不是列表。这里有很多循环和rbind()和cbind()，所以它可能不是最有效的设计，但它使代码相当简洁，因为cbind() ing会自动复制在该位置使用该值的所有排列的每个选定值。

以下是对您的向量x的实施演示：

x <- c(1,1,2,2,3,4);
gpermute(x);
##        [,1] [,2] [,3] [,4] [,5] [,6]
##   [1,]    1    1    2    2    3    4
##   [2,]    1    1    2    2    4    3
##   [3,]    1    1    2    3    2    4
##   [4,]    1    1    2    3    4    2
##   [5,]    1    1    2    4    2    3
##   [6,]    1    1    2    4    3    2
##   [7,]    1    1    3    2    2    4
##   [8,]    1    1    3    2    4    2
##   [9,]    1    1    3    4    2    2
##  [10,]    1    1    4    2    2    3
##  [11,]    1    1    4    2    3    2
##  [12,]    1    1    4    3    2    2
##  [13,]    1    2    1    2    3    4
##  [14,]    1    2    1    2    4    3
##  [15,]    1    2    1    3    2    4
##  [16,]    1    2    1    3    4    2
##  [17,]    1    2    1    4    2    3
##  [18,]    1    2    1    4    3    2
##  [19,]    1    2    2    1    3    4
##  [20,]    1    2    2    1    4    3
##  [21,]    1    2    2    3    1    4
##  [22,]    1    2    2    3    4    1
##  [23,]    1    2    2    4    1    3
##  [24,]    1    2    2    4    3    1
##  [25,]    1    2    3    1    2    4
##  [26,]    1    2    3    1    4    2
##  [27,]    1    2    3    2    1    4
##  [28,]    1    2    3    2    4    1
##  [29,]    1    2    3    4    1    2
##  [30,]    1    2    3    4    2    1
##  [31,]    1    2    4    1    2    3
##  [32,]    1    2    4    1    3    2
##  [33,]    1    2    4    2    1    3
##  [34,]    1    2    4    2    3    1
##  [35,]    1    2    4    3    1    2
##  [36,]    1    2    4    3    2    1
##  [37,]    1    3    1    2    2    4
##  [38,]    1    3    1    2    4    2
##  [39,]    1    3    1    4    2    2
##  [40,]    1    3    2    1    2    4
##  [41,]    1    3    2    1    4    2
##  [42,]    1    3    2    2    1    4
##  [43,]    1    3    2    2    4    1
##  [44,]    1    3    2    4    1    2
##  [45,]    1    3    2    4    2    1
##  [46,]    1    3    4    1    2    2
##  [47,]    1    3    4    2    1    2
##  [48,]    1    3    4    2    2    1
##  [49,]    1    4    1    2    2    3
##  [50,]    1    4    1    2    3    2
##  [51,]    1    4    1    3    2    2
##  [52,]    1    4    2    1    2    3
##  [53,]    1    4    2    1    3    2
##  [54,]    1    4    2    2    1    3
##  [55,]    1    4    2    2    3    1
##  [56,]    1    4    2    3    1    2
##  [57,]    1    4    2    3    2    1
##  [58,]    1    4    3    1    2    2
##  [59,]    1    4    3    2    1    2
##  [60,]    1    4    3    2    2    1
##  [61,]    2    1    1    2    3    4
##  [62,]    2    1    1    2    4    3
##  [63,]    2    1    1    3    2    4
##  [64,]    2    1    1    3    4    2
##  [65,]    2    1    1    4    2    3
##  [66,]    2    1    1    4    3    2
##  [67,]    2    1    2    1    3    4
##  [68,]    2    1    2    1    4    3
##  [69,]    2    1    2    3    1    4
##  [70,]    2    1    2    3    4    1
##  [71,]    2    1    2    4    1    3
##  [72,]    2    1    2    4    3    1
##  [73,]    2    1    3    1    2    4
##  [74,]    2    1    3    1    4    2
##  [75,]    2    1    3    2    1    4
##  [76,]    2    1    3    2    4    1
##  [77,]    2    1    3    4    1    2
##  [78,]    2    1    3    4    2    1
##  [79,]    2    1    4    1    2    3
##  [80,]    2    1    4    1    3    2
##  [81,]    2    1    4    2    1    3
##  [82,]    2    1    4    2    3    1
##  [83,]    2    1    4    3    1    2
##  [84,]    2    1    4    3    2    1
##  [85,]    2    2    1    1    3    4
##  [86,]    2    2    1    1    4    3
##  [87,]    2    2    1    3    1    4
##  [88,]    2    2    1    3    4    1
##  [89,]    2    2    1    4    1    3
##  [90,]    2    2    1    4    3    1
##  [91,]    2    2    3    1    1    4
##  [92,]    2    2    3    1    4    1
##  [93,]    2    2    3    4    1    1
##  [94,]    2    2    4    1    1    3
##  [95,]    2    2    4    1    3    1
##  [96,]    2    2    4    3    1    1
##  [97,]    2    3    1    1    2    4
##  [98,]    2    3    1    1    4    2
##  [99,]    2    3    1    2    1    4
## [100,]    2    3    1    2    4    1
## [101,]    2    3    1    4    1    2
## [102,]    2    3    1    4    2    1
## [103,]    2    3    2    1    1    4
## [104,]    2    3    2    1    4    1
## [105,]    2    3    2    4    1    1
## [106,]    2    3    4    1    1    2
## [107,]    2    3    4    1    2    1
## [108,]    2    3    4    2    1    1
## [109,]    2    4    1    1    2    3
## [110,]    2    4    1    1    3    2
## [111,]    2    4    1    2    1    3
## [112,]    2    4    1    2    3    1
## [113,]    2    4    1    3    1    2
## [114,]    2    4    1    3    2    1
## [115,]    2    4    2    1    1    3
## [116,]    2    4    2    1    3    1
## [117,]    2    4    2    3    1    1
## [118,]    2    4    3    1    1    2
## [119,]    2    4    3    1    2    1
## [120,]    2    4    3    2    1    1
## [121,]    3    1    1    2    2    4
## [122,]    3    1    1    2    4    2
## [123,]    3    1    1    4    2    2
## [124,]    3    1    2    1    2    4
## [125,]    3    1    2    1    4    2
## [126,]    3    1    2    2    1    4
## [127,]    3    1    2    2    4    1
## [128,]    3    1    2    4    1    2
## [129,]    3    1    2    4    2    1
## [130,]    3    1    4    1    2    2
## [131,]    3    1    4    2    1    2
## [132,]    3    1    4    2    2    1
## [133,]    3    2    1    1    2    4
## [134,]    3    2    1    1    4    2
## [135,]    3    2    1    2    1    4
## [136,]    3    2    1    2    4    1
## [137,]    3    2    1    4    1    2
## [138,]    3    2    1    4    2    1
## [139,]    3    2    2    1    1    4
## [140,]    3    2    2    1    4    1
## [141,]    3    2    2    4    1    1
## [142,]    3    2    4    1    1    2
## [143,]    3    2    4    1    2    1
## [144,]    3    2    4    2    1    1
## [145,]    3    4    1    1    2    2
## [146,]    3    4    1    2    1    2
## [147,]    3    4    1    2    2    1
## [148,]    3    4    2    1    1    2
## [149,]    3    4    2    1    2    1
## [150,]    3    4    2    2    1    1
## [151,]    4    1    1    2    2    3
## [152,]    4    1    1    2    3    2
## [153,]    4    1    1    3    2    2
## [154,]    4    1    2    1    2    3
## [155,]    4    1    2    1    3    2
## [156,]    4    1    2    2    1    3
## [157,]    4    1    2    2    3    1
## [158,]    4    1    2    3    1    2
## [159,]    4    1    2    3    2    1
## [160,]    4    1    3    1    2    2
## [161,]    4    1    3    2    1    2
## [162,]    4    1    3    2    2    1
## [163,]    4    2    1    1    2    3
## [164,]    4    2    1    1    3    2
## [165,]    4    2    1    2    1    3
## [166,]    4    2    1    2    3    1
## [167,]    4    2    1    3    1    2
## [168,]    4    2    1    3    2    1
## [169,]    4    2    2    1    1    3
## [170,]    4    2    2    1    3    1
## [171,]    4    2    2    3    1    1
## [172,]    4    2    3    1    1    2
## [173,]    4    2    3    1    2    1
## [174,]    4    2    3    2    1    1
## [175,]    4    3    1    1    2    2
## [176,]    4    3    1    2    1    2
## [177,]    4    3    1    2    2    1
## [178,]    4    3    2    1    1    2
## [179,]    4    3    2    1    2    1
## [180,]    4    3    2    2    1    1

我们可以证明结果与unique(permn(x))使用以下代码返回的结果相同，不幸的是必须稍微涉及，因为（1）我们有列表与矩阵类型不匹配，以及（ 2）两种解决方案之间的排列顺序恰好不同。

library('combinat');
mcombinat <- do.call(rbind,unique(permn(x)));
mcombinat.sorted <- mcombinat[do.call(order,lapply(1:ncol(mcombinat),function(c) mcombinat[,c])),];
mbgoldst <- gpermute(x);
identical(mcombinat.sorted,mbgoldst);
## [1] TRUE

最后，让我演示2对和2个四重奏输入的代码：

x2 <- c(1,1,1,1,2,2,2,2,3,3,4,4);
system.time({ m <- gpermute(x2); });
##    user  system elapsed
##  36.547   0.000  36.593
head(m);
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]    1    1    1    1    2    2    2    2    3     3     4     4
## [2,]    1    1    1    1    2    2    2    2    3     4     3     4
## [3,]    1    1    1    1    2    2    2    2    3     4     4     3
## [4,]    1    1    1    1    2    2    2    2    4     3     3     4
## [5,]    1    1    1    1    2    2    2    2    4     3     4     3
## [6,]    1    1    1    1    2    2    2    2    4     4     3     3
nrow(m);
## [1] 207900

所以花了一些时间，但它已经完成了。我尝试运行unique(permn(x2))，但在几十分钟之后它还没有完成，我认为我们可能认为它永远不会完成，因为它必须生成479,001,600个非唯一排列，{{1那么就有了制作独特的任务。快速计算还表明它需要大约46GB的RAM，这几乎是我系统上可用的24GB RAM的两倍。这也可能是一个问题...

Answer 2

使用RccpAlgos包。

> library(RcppAlgos)
> x <- permuteGeneral(c(1,2,3,4), freqs = c(2,2,1,1))
> dim(x)
[1] 180   6
> head(x)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    2    3    4
[2,]    1    1    2    2    4    3
[3,]    1    1    2    3    2    4
[4,]    1    1    2    3    4    2
[5,]    1    1    2    4    2    3
[6,]    1    1    2    4    3    2

Answer 3

我不知道这是否更快，瓶颈可能是问题的高维度，因为x会再次出现：

library(gtools)
x=c(1,1,2,2,3,4)
order = permutations(n=length(x), r=length(x))
x = matrix(x[order], ncol=6)
x = x[!duplicated(x), ]

具有重复的列表的不同排列

3 个答案: