Question

我希望创建包含两个不同值的向量的所有可能排列，其中我控制每个值的比例。

例如，如果我有一个长度为3的向量，并且我想要包含单个1的所有可能组合，那么我想要的输出是一个如下所示的列表：

list.1 <- list(c(1,0,0), c(0,1,0), c(0,0,1))

相反，如果我想要包含三个1的所有可能组合，我想要的输出是一个如下所示的列表：

list.3 <- list(c(1,1,1))

换句话说，1和0值的模式很重要，但所有1 s应该被视为与所有其他1相同。

根据在这里和其他地方的搜索，我尝试了几种方法：

expand.grid(0:1, 0:1, 0:1)  # this includes all possible combinations of 1, 2, or 3 ones
permn(c(0,1,1))             # this does not treat the ones as identical (e.g. it produces (0,1,1) twice)
unique(permn(c(0,1,1)))     # this does the job!

因此，使用包permn中的函数combinat似乎很有希望。但是，我把它扩展到我的实际问题（长度为20的矢量，50％1s和50％0s，我遇到了问题：

unique(permn(c(rep(1,10), rep(0, 10))))

# returns the error:
Error in vector("list", gamma(n + 1)) : 
  vector size specified is too large

我的理解是，这种情况正在发生，因为在调用permn时，它会生成一个包含所有可能排列的列表，即使它们中的许多都是相同的，并且此列表对于R来说太大而无法处理。

有没有人建议如何解决这个问题？

很抱歉，如果之前已经回答了这个问题，那么很多很多SO问题包含相似的语言，但问题不同，我无法找到满足我需求的解决方案！

Answer 1

它不应该是expand.grid包含所有排列的交易破坏者。只需添加一个子集：

combinations <- function(size, choose) {

  d <- do.call("expand.grid", rep(list(0:1), size))
  d[rowSums(d) == choose,]

}

combinations(size=10, choose=3)
#    Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10
# 8     1    1    1    0    0    0    0    0    0     0
# 12    1    1    0    1    0    0    0    0    0     0
# 14    1    0    1    1    0    0    0    0    0     0
# 15    0    1    1    1    0    0    0    0    0     0
# 20    1    1    0    0    1    0    0    0    0     0
# 22    1    0    1    0    1    0    0    0    0     0
...

Answer 2

问题确实是你最初计算所有阶乘（20）（~10 ^ 18）排列，这些排列不适合你的记忆。您正在寻找的是计算多集排列的有效方法。 multicool包可以执行此操作：

library(multicool)

res <- allPerm(initMC(c(rep(0,10),rep(1,10) )))

这个计算在我的笔记本电脑上花了大约两分钟，但绝对可行。

R-找到值的唯一排列

2 个答案: