Question

我有一个整数列表，例如：（1,2,3,4,5）

我想获得所有可能的大小为5的列表，例如：

1。列表可以包含重复元素，例如（1,1,1,2,2-）

2。订购无关紧要，例如（1,1,2,2,1）与（1,1,1,2,2）

相同

如何获得此完整列表？我实际上是在寻找10个整数的10个组合。

Answer 1

Gregor提供的链接似乎完全依赖第三方软件包来生成多个集合，所以我想给你一个基础R解决方案。请注意，对于超大型数据集，该链接中提到的包几乎肯定会更有效。

我们可以使用expand.grid()来首先生成所有可能的排列，重复（1,2,3,4,5）中的元素。在这种情况下，仍然认为不同的顺序是不同的。我们现在要删除这些包含相同元素但订单不同的“额外”组合，我们可以使用apply()和duplicated()来执行这些操作。

如果您使用多重计算器here，您会发现下面的代码会产生正确数量的组合。这是代码：

x <- seq(1:5)

df <- expand.grid(x, x, x, x, x) # generates 5^5 combinations, allowing repetition

index <- !duplicated(t(apply(df, 1, sort))) # find extraneous combinations
df <- df[index, ] # select only unique combinations

# check number of rows. It should be 126; one for each combination
nrows(df)

# Output
# [1] 126

# Quick look at part of the dataframe:

head(df)
  Var1 Var2 Var3 Var4 Var5
1    1    1    1    1    1
2    2    1    1    1    1
3    3    1    1    1    1
4    4    1    1    1    1
5    5    1    1    1    1
7    2    2    1    1    1

Answer 2

使用推荐in this answer的RcppAlgos解决方案，我们希望从输入中选择5个元素的集合，重复，顺序无关紧要（因此comboGeneral()，我们会使用permuteGeneral()如果订单很重要）。用C ++编写，这将是一个非常快速的解决方案，链接答案中的分析也发现它具有内存效率。在我的笔记本电脑上生成10多支磁带10的设置仍然不到一秒钟。

library(RcppAlgos)
x = 1:5
result = comboGeneral(x, m = 5, repetition = T)
dim(result)
# [1] 126   5
head(result)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    1    1    1    1    1
# [2,]    1    1    1    1    2
# [3,]    1    1    1    1    3
# [4,]    1    1    1    1    4
# [5,]    1    1    1    1    5
# [6,]    1    1    1    2    2

Answer 3

对于tidyverse中@ MarcusCampbell的类似方法，我们可以使用expand枚举所有可能的组合，然后只保留distinct组合，这些组合在排列下是不变的（即排序无关紧要）：

library(tidyverse);
tibble(V1 = 1:5, V2 = 1:5, V3 = 1:5, V4 = 1:5, V5 = 1:5) %>%
    expand(V1, V2, V3, V4, V5) %>%
    rowwise() %>%
    mutate(cmbn = paste(sort(c(V1, V2, V3, V4, V5)), collapse = ",")) %>%
    distinct(cmbn);
    ## A tibble: 126 x 1
    #   cmbn
    #   <chr>
    # 1 1,1,1,1,1
    # 2 1,1,1,1,2
    # 3 1,1,1,1,3
    # 4 1,1,1,1,4
    # 5 1,1,1,1,5
    # 6 1,1,1,2,2
    # 7 1,1,1,2,3
    # 8 1,1,1,2,4
    # 9 1,1,1,2,5
    #10 1,1,1,3,3
    ## ... with 116 more rows

通过重复在R中获得不同的组合

3 个答案: