R:从向量中列出元素的所有子集,以便它们的和只传递一个值

时间:2016-01-05 23:31:38

标签: r constraints subset

如果答案(1)微不足道,请提前抱歉;或(2)在那里,但我无法解决这个问题或在线查找和回答。任何指针都将非常感激!

我需要一段可以通过向量运行的代码,并返回累积和超过阈值的所有可能元素子集。

请注意,我不想只提供准确阈值的子集。累积和可以高于阈值,只要算法在已经实现了值的情况下停止添加额外元素。

# A tiny example of the kind of input data. 
# However, note that efficiency is an issue 
# (I need to replicate the example in a large dataset)
v <- seq(1, 3) # My vector
threshold <- 3 # My threshold value

# I would like to get a list with the combinations
# 1 2 
# 1 3
# 2 3
# 3 

这段代码有效,但却是地球上最笨重的解决方案......

for (i in 1: length(v)){
  thisvalue <- v[i]
  if (thisvalue >=threshold) { 
    cat (v[i], "\n",sep="\t") 
  } else {
    for (j in (i+1): length(v)){
      thisvalue <- v[i]+v[j]
      if (thisvalue >=threshold) { 
        cat (c(v[i], v[j]), "\n",sep="\t")
      } else {
        for (k in (i+2): length(v)){
          thisvalue <- v[i]+v[j]+v[k]
          if (thisvalue >=threshold) { 
            cat(c(v[i],v[j],v[k]),"\n",sep="\t")
        }
        }
      }
    }
  }
}

2 个答案:

答案 0 :(得分:0)

这可能是一个选项:

library(utils)
v <- seq (1,5)
v.len <- length(v)
threshold <- 3
for (count in seq(1,v.len))
{
  print(paste("subset length",count))
  combinations <- combn(v,count)
  out <- combinations[,apply(combinations, 2, sum)>=threshold]
  print (out)
}
上面的

产生以下输出:

[1] "subset length 1"
[1] 3 4 5
[1] "subset length 2"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    2    2    2    3    3     4
[2,]    2    3    4    5    3    4    5    4    5     5
[1] "subset length 3"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    1    1    2    2    2     3
[2,]    2    2    2    3    3    4    3    3    4     4
[3,]    3    4    5    4    5    5    4    5    5     5
[1] "subset length 4"
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    2
[2,]    2    2    2    3    3
[3,]    3    3    4    4    4
[4,]    4    5    5    5    5
[1] "subset length 5"
[1] 1 2 3 4 5

所以你需要对输出做一些事情/决定在哪里存储它等。

答案 1 :(得分:0)

我在有限的编码技能中找到了一个可能效率低下的解决方案,但它比编写无限循环更可行,也更简洁。

该函数的灵感来自于在的java代码 Find all subsets of a set that sum up to n

recursive.subset <-function(x, index, current, threshold, result){
  for (i in index:length(x)){
    if (current + x[i] >= threshold){
      cat (result, x[i], "\n",sep="\t") 
    } else {
      recursive.subset(x, i + 1, current+x[i], threshold, c(result,x[i]))
    }
  }
}

要调用该函数,只需

inivector <- vector(mode="numeric", length=0) #initializing empty vector
recursive.subset (v, 1, 0, threshold, inivector)

所以我得到

1 2
1 3
2 3
    3