在R中重复生成特定长度的排列?

时间:2017-12-12 20:37:35

标签: r permutation

我有一个单词列表,我需要重复生成所有排列。必须指定排列长度。单词列表相当大(即30个单词)所以我需要的功能也是有效的..   示例:

wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")

我需要生成所有排列,因为每个排列必须正好有3个单词。那将是["alice", "moon", "walks"]["alice", "walks", "moon"]["moon", "alice", "walks"]

3 个答案:

答案 0 :(得分:2)

有几个包可以完全满足您的需求。让我们从经典gtools开始。此外,从OP提供的示例的外观来看,我们正在寻找不重复的排列,而不是重复的组合。

wordsList <- c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")

library(gtools)
attempt1 <- permutations(length(wordsList), 3, wordsList)
head(attempt1)
        [,1]    [,2]     [,3]    
[1,] "alice" "bravo"  "guitar"
[2,] "alice" "bravo"  "mars"  
[3,] "alice" "bravo"  "moon"  
[4,] "alice" "bravo"  "sings" 
[5,] "alice" "bravo"  "walks" 
[6,] "alice" "guitar" "bravo"

然后有iterpc

library(iterpc)
attempt2 <- getall(iterpc(length(wordsList), 3, labels = wordsList, ordered = TRUE))
head(attempt2)
        [,1]    [,2]    [,3]    
[1,] "alice" "moon"  "walks" 
[2,] "alice" "moon"  "mars"  
[3,] "alice" "moon"  "sings" 
[4,] "alice" "moon"  "guitar"
[5,] "alice" "moon"  "bravo" 
[6,] "alice" "walks" "moon"

最后,RcppAlgos(我是其作者)

library(RcppAlgos)
attempt3 <- permuteGeneral(wordsList, 3)
head(attempt3)
        [,1]     [,2]     [,3]    
[1,] "alice"  "bravo"  "guitar"
[2,] "bravo"  "alice"  "guitar"
[3,] "guitar" "alice"  "bravo" 
[4,] "alice"  "guitar" "bravo" 
[5,] "bravo"  "guitar" "alice" 
[6,] "guitar" "bravo"  "alice"

它们都相当有效并产生类似的结果(不同的排序)

identical(attempt1[do.call(order,as.data.frame(attempt1)),],
          attempt3[do.call(order,as.data.frame(attempt3)),])
[1] TRUE

identical(attempt1[do.call(order,as.data.frame(attempt1)),],
          attempt2[do.call(order,as.data.frame(attempt2)),])
[1] TRUE

如果你真的想要重复排列,每个函数都提供了执行该函数的参数。

由于OP使用的wordsList超过3000个字,并且正在查找一次选择15个的所有排列,因此上述方法将失败。有一些替代方案,来自iterpc以及RcppAlgos

使用iterpc,您可以使用函数getnext并生成连续的排列。我怀疑你能够在合理的时间内生成它们或将它们存储在一个位置(即假设每个单元占用8个字节,10^52 * 15 * 8/(2^80) > 10^29 YB是......那些是yobibytes ...解释:&#34; 很多数据&#34;)。

使用RcppAlgos,您可以使用rowCap参数输出特定数量的排列,直至2^31 - 1。 E.g:

permuteGeneral(wordsList, 3, upper = 5)
        [,1]     [,2]     [,3]    
[1,] "alice"  "bravo"  "guitar"
[2,] "bravo"  "alice"  "guitar"
[3,] "guitar" "alice"  "bravo" 
[4,] "alice"  "guitar" "bravo" 
[5,] "bravo"  "guitar" "alice"

答案 1 :(得分:0)

您可以使用utils包中的wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo") combn(wordsList, 3) 函数。

{{1}}

这给出了长输出我不想在这里重现。您也可以将输入作为一个因素,这可能有助于提高速度。

答案 2 :(得分:0)

要使用重复真正生成组合,Joseph Wood的解决方案是关于不重复的排列。 (编辑:虽然OP写的重复组合,他可能意味着排列!?见评论)

library(iterpc)
wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
getall(iterpc(length(wordsList), 3, labels = wordsList, replace = TRUE))
#>       [,1]     [,2]     [,3]    
#>  [1,] "alice"  "alice"  "alice" 
#>  [2,] "alice"  "alice"  "moon"  
#>  [3,] "alice"  "alice"  "walks" 
#>  [4,] "alice"  "alice"  "mars"  
#>  [5,] "alice"  "alice"  "sings" 
#>  [6,] "alice"  "alice"  "guitar"
#>  [7,] "alice"  "alice"  "bravo" 
#>  [8,] "alice"  "moon"   "moon"  
#>  [9,] "alice"  "moon"   "walks" 
..
..