我有一个单词列表,我需要重复生成所有排列。必须指定排列长度。单词列表相当大(即30个单词)所以我需要的功能也是有效的.. 示例:
wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
我需要生成所有排列,因为每个排列必须正好有3个单词。那将是["alice", "moon", "walks"]
,["alice", "walks", "moon"]
,["moon", "alice", "walks"]
等
答案 0 :(得分:2)
有几个包可以完全满足您的需求。让我们从经典gtools
开始。此外,从OP提供的示例的外观来看,我们正在寻找不重复的排列,而不是重复的组合。
wordsList <- c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
library(gtools)
attempt1 <- permutations(length(wordsList), 3, wordsList)
head(attempt1)
[,1] [,2] [,3]
[1,] "alice" "bravo" "guitar"
[2,] "alice" "bravo" "mars"
[3,] "alice" "bravo" "moon"
[4,] "alice" "bravo" "sings"
[5,] "alice" "bravo" "walks"
[6,] "alice" "guitar" "bravo"
然后有iterpc
。
library(iterpc)
attempt2 <- getall(iterpc(length(wordsList), 3, labels = wordsList, ordered = TRUE))
head(attempt2)
[,1] [,2] [,3]
[1,] "alice" "moon" "walks"
[2,] "alice" "moon" "mars"
[3,] "alice" "moon" "sings"
[4,] "alice" "moon" "guitar"
[5,] "alice" "moon" "bravo"
[6,] "alice" "walks" "moon"
最后,RcppAlgos
(我是其作者)
library(RcppAlgos)
attempt3 <- permuteGeneral(wordsList, 3)
head(attempt3)
[,1] [,2] [,3]
[1,] "alice" "bravo" "guitar"
[2,] "bravo" "alice" "guitar"
[3,] "guitar" "alice" "bravo"
[4,] "alice" "guitar" "bravo"
[5,] "bravo" "guitar" "alice"
[6,] "guitar" "bravo" "alice"
它们都相当有效并产生类似的结果(不同的排序)
identical(attempt1[do.call(order,as.data.frame(attempt1)),],
attempt3[do.call(order,as.data.frame(attempt3)),])
[1] TRUE
identical(attempt1[do.call(order,as.data.frame(attempt1)),],
attempt2[do.call(order,as.data.frame(attempt2)),])
[1] TRUE
如果你真的想要重复排列,每个函数都提供了执行该函数的参数。
由于OP使用的wordsList
超过3000个字,并且正在查找一次选择15个的所有排列,因此上述方法将失败。有一些替代方案,来自iterpc
以及RcppAlgos
。
使用iterpc
,您可以使用函数getnext
并生成连续的排列。我怀疑你能够在合理的时间内生成它们或将它们存储在一个位置(即假设每个单元占用8个字节,10^52 * 15 * 8/(2^80) > 10^29 YB
是......那些是yobibytes ...解释:&#34; 很多数据&#34;)。
使用RcppAlgos
,您可以使用rowCap
参数输出特定数量的排列,直至2^31 - 1
。 E.g:
permuteGeneral(wordsList, 3, upper = 5)
[,1] [,2] [,3]
[1,] "alice" "bravo" "guitar"
[2,] "bravo" "alice" "guitar"
[3,] "guitar" "alice" "bravo"
[4,] "alice" "guitar" "bravo"
[5,] "bravo" "guitar" "alice"
答案 1 :(得分:0)
您可以使用utils
包中的wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
combn(wordsList, 3)
函数。
{{1}}
这给出了长输出我不想在这里重现。您也可以将输入作为一个因素,这可能有助于提高速度。
答案 2 :(得分:0)
要使用重复真正生成组合,Joseph Wood的解决方案是关于不重复的排列。 (编辑:虽然OP写的重复组合,他可能意味着排列!?见评论)
library(iterpc)
wordsList = c("alice", "moon", "walks", "mars", "sings", "guitar", "bravo")
getall(iterpc(length(wordsList), 3, labels = wordsList, replace = TRUE))
#> [,1] [,2] [,3]
#> [1,] "alice" "alice" "alice"
#> [2,] "alice" "alice" "moon"
#> [3,] "alice" "alice" "walks"
#> [4,] "alice" "alice" "mars"
#> [5,] "alice" "alice" "sings"
#> [6,] "alice" "alice" "guitar"
#> [7,] "alice" "alice" "bravo"
#> [8,] "alice" "moon" "moon"
#> [9,] "alice" "moon" "walks"
..
..