R中的无序组合

时间:2015-01-14 22:25:14

标签: r combinations

我正在寻找一个函数,它返回一个向量的无序组合。例如

x<-c('red','blue','black')
uncomb(x)
[1]'red'
[2]'blue'
[3]'black'
[4]'red','blue'
[5]'blue','black'
[6]'red','black'
[7]'red','blue','black'

我想在某个库中有一个函数可以执行此操作,但是找不到它。我正在尝试permutations gtool,但这不是我要找的功能。

3 个答案:

答案 0 :(得分:14)

您可以将x的长度应用于m函数的combn()参数。

x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
# 
# [[2]]
# [1] "blue"
# 
# [[3]]
# [1] "black"
# 
# [[4]]
# [1] "red"  "blue"
# 
# [[5]]
# [1] "red"   "black"
# 
# [[6]]
# [1] "blue"  "black"
# 
# [[7]]
# [1] "red"   "blue"  "black"

如果您更喜欢矩阵结果,则可以将stringi::stri_list2matrix()应用于上面的列表。

stringi::stri_list2matrix(
    do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE)),
    byrow = TRUE
)
#      [,1]    [,2]    [,3]   
# [1,] "red"   NA      NA     
# [2,] "blue"  NA      NA     
# [3,] "black" NA      NA     
# [4,] "red"   "blue"  NA     
# [5,] "red"   "black" NA     
# [6,] "blue"  "black" NA     
# [7,] "red"   "blue"  "black"

答案 1 :(得分:1)

我被List All Combinations With combn重新定向到这里,因为这是一个欺骗目标。这是一个老问题,@ RichScriven提供的答案非常好,但我想给社区一些可以说更自然,更有效的选项(最后两个)。

我们首先注意到输出与Power Set非常相似。从powerSet包调用rje,我们看到确实我们的输出匹配幂集中的每个元素,除了第一个元素,它等同于Empty Set

x <- c("red", "blue", "black")
rje::powerSet(x)
[[1]]
character(0)   ## empty set equivalent

[[2]]
[1] "red"

[[3]]
[1] "blue"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "black"

[[6]]
[1] "red"   "black"

[[7]]
[1] "blue"  "black"

[[8]]
[1] "red"   "blue"  "black"

如果您不想要第一个元素,可以轻松地在函数调用结束时添加[-1],如下所示:rje::powerSet(x)[-1]

接下来的两个解决方案来自较新的软件包arrangementsRcppAlgos(我是作者),这将为用户提供更高的效率。这两个包都能够生成Multisets的组合。

  

为什么这很重要?

可以证明A的幂集中one-to-one mapping到多集c(rep(emptyElement, length(A)), A)的所有组合length(A)选择emptyElement,其中library(arrangements) combinations(x = c("",x), k = 3, freq = c(2, rep(1, 3))) [,1] [,2] [,3] [1,] "" "" "red" [2,] "" "" "blue" [3,] "" "" "black" [4,] "" "red" "blue" [5,] "" "red" "black" [6,] "" "blue" "black" [7,] "red" "blue" "black" library(RcppAlgos) comboGeneral(c("",x), 3, freqs = c(2, rep(1, 3))) [,1] [,2] [,3] [1,] "" "" "black" [2,] "" "" "blue" [3,] "" "" "red" [4,] "" "black" "blue" [5,] "" "black" "red" [6,] "" "blue" "red" [7,] "black" "blue" "red" 是空集的表示(如零或空白)。考虑到这一点,请观察:

lapply

如果您不喜欢处理空白元素和/或矩阵,您还可以返回使用lapply(seq_along(x), comboGeneral, v = x) [[1]] [,1] [1,] "black" [2,] "blue" [3,] "red" [[2]] [,1] [,2] [1,] "black" "blue" [2,] "black" "red" [3,] "blue" "red" [[3]] [,1] [,2] [,3] [1,] "black" "blue" "red" lapply(seq_along(x), combinations, n = length(x), x = x) [[1]] [,1] [1,] "red" [2,] "blue" [3,] "black" [[2]] [,1] [,2] [1,] "red" "blue" [2,] "red" "black" [3,] "blue" "black" [[3]] [,1] [,2] [,3] [1,] "red" "blue" "black" 的列表。

do.call(c,

现在我们展示最后两种方法效率更高(注意我从@RichSciven提供的答案中删除了simplify = FALSErje::powerSet,以便比较类似输出的生成。我还包括{ {1}}好的衡量标准):

set.seed(8128)
bigX <- sort(sample(10^6, 20)) ## With this as an input, we will get 2^20 - 1 results.. i.e. 1,048,575
library(microbenchmark)
microbenchmark(powSetRje = powerSet(bigX),
               powSetRich = lapply(seq_along(bigX), combn, x = bigX),
               powSetArrange = lapply(seq_along(bigX), function(y) combinations(x = bigX, k = y)),
               powSetAlgos = lapply(seq_along(bigX), comboGeneral, v = bigX),
               unit = "relative")

Unit: relative
          expr       min        lq      mean    median       uq      max neval
     powSetRje 52.992681 15.055038 11.091203 13.586952 8.860661 7.347368   100
    powSetRich 58.679666 14.864760 10.914700 13.198179 8.675812 6.017437   100
 powSetArrange  1.042766  1.062227  1.071404  1.098491 1.126971 1.044827   100
   powSetAlgos  1.000000  1.000000  1.000000  1.000000 1.000000 1.000000   100

更进一步,arrangements配备了一个名为type的参数,允许用户为其输出选择特定格式。其中一个是type = "l"列表。它类似于在simplify = FALSE中设置combn,并允许我们获得类似powerSet的输出。观察:

do.call(c, lapply(seq_along(x), combinations, n = length(x), x = x, type = "l"))
[[1]]
[1] "red"

[[2]]
[1] "blue"

[[3]]
[1] "black"

[[4]]
[1] "red"  "blue"

[[5]]
[1] "red"   "black"

[[6]]
[1] "blue"  "black"

[[7]]
[1] "red"   "blue"  "black"

基准:

microbenchmark(powSetRje = powerSet(bigX)[-1],
               powSetRich = do.call(c, lapply(seq_along(bigX), combn, x = bigX, simplify = FALSE)),
               powSetArrange = do.call(c, lapply(seq_along(bigX), combinations, n = length(bigX), x = bigX, type = "l")),
               times = 15, unit = "relative")
Unit: relative
          expr      min       lq     mean   median       uq      max neval
     powSetRje 4.925559 4.433365 4.013872 3.893674 3.819344 3.609616    15
    powSetRich 5.732216 4.975508 4.542482 4.564668 4.288592 4.003765    15
 powSetArrange 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    15

答案 2 :(得分:1)

具有矩阵结果的解决方案,无需使用任何外部包:

store <- lapply(
  seq_along(x), 
  function(i) {
    out <- combn(x, i) 
    N <- NCOL(out)
    length(out) <- length(x) * N
    matrix(out, ncol = N, byrow = TRUE)
})
t(do.call(cbind, store))

     [,1]    [,2]    [,3]   
[1,] "red"   NA      NA     
[2,] "blue"  NA      NA     
[3,] "black" NA      NA     
[4,] "red"   "black" NA     
[5,] "blue"  "blue"  NA     
[6,] "red"   "black" NA     
[7,] "red"   "blue"  "black"