如何改进生成多集合组合的算法?

时间:2013-05-13 03:45:13

标签: java c++ algorithm optimization

如何优化以下生成器中的next()hasNext()方法,这些方法生成有界多重集的组合? (我把它发布到C ++和Java,因为代码是C ++兼容的,没有特定于Java的元素,不能直接转换为C ++。

算法的特定区域是有问题的整个hasNext()方法可能会不必要地复杂化,并且行:

if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;

有一个if语句,我认为可以以某种方式删除。我有一个早期版本的算法,它在返回语句之前有一些回溯,因此有一个更简单的hasNext()测试,但我无法使该版本工作。

该算法的背景是很难找到。例如,在Knuth 7.2.1.3中,他只是说它可以完成(并提供练习来证明算法是可行的),但是没有给出算法。同样,我有六个关于组合学的高级文本(包括Papadimitriou和Kreher / Stimson),并且它们都没有给出用于生成多重组合的组合的算法。 Kreher将其视为“为读者练习”。无论如何,如果您可以改进上述算法或提供比我更高效的工作实现的参考,我将不胜感激。请只提供迭代算法(请不要递归)。

/** The iterator returns a 1-based array of integers. When the last combination is reached hasNext() will be false.
  * @param aiItems One-based array containing number of items available for each unique item type where aiItems[0] is the number of item types
  * @param ctSlots  The number of slots into which the items go
  * @return The iterator which generates the 1-based array containing the combinations or null in the event of an error.
  */
public static java.util.Iterator<int[]> combination( final int[] aiItems, final int ctSlots ){ // multiset combination into a limited number of slots
    CombinatoricIterator<int[]> iterator = new CombinatoricIterator<int[]>(){
        int xSlot;
        int xItemType;
        int ctItemType;
        int[] current = new int[ctSlots + 1];
        int[] aiItemsUsed = new int[aiItems[0] + 1];
        { reset(); current[0] = ctSlots; ctItemType = aiItems[0]; }
        public boolean hasNext(){
            int xUseSlot = ctSlots;
            int iCurrentType = ctItemType;
            int ctItemsUsed = 0;
            int ctTotalItemsUsed = 0;
            while( true ){
                int xUsedType = current[xUseSlot];
                if( xUsedType != iCurrentType ) return true;
                ctItemsUsed++;
                ctTotalItemsUsed++;
                if( ctTotalItemsUsed == ctSlots ) return false;
                if( ctItemsUsed == aiItems[xUsedType] ){
                    iCurrentType--;
                    ctItemsUsed = 0;
                }
                xUseSlot--;
            }
        }
        public int[] next(){
            while( true ){
                while( xItemType == ctItemType ){
                    xSlot--;
                    xItemType = current[xSlot];
                }
                xItemType++;
                while( true ){
                    while( aiItemsUsed[xItemType] == aiItems[xItemType] && xItemType != current[xSlot] ){
                        while( xItemType == ctItemType ){
                            xSlot--;
                            xItemType = current[xSlot];
                        }
                        xItemType++;
                    }
                    if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;
                    current[xSlot] = xItemType;
                    aiItemsUsed[xItemType]++;
                    if( xSlot == ctSlots ){
                        return current;
                    }
                    xSlot++;
                }
            }

        }
        public int[] get(){ return current; }
        public void remove(){}
        public void set( int[] current ){ this.current = current; }
        public void setValues( int[] current ){
            if( this.current == null || this.current.length != current.length ) this.current = new int[current.length];
            System.arraycopy( current, 0, this.current, 0, current.length );
        }
        public void reset(){
            xSlot = 1;
            xItemType = 0;
            Arrays.fill( current, 0 ); current[0] = ctSlots;
            Arrays.fill( aiItemsUsed, 0 ); aiItemsUsed[0] = aiItems[0];
        }
    };
    return iterator;
}

附加信息

到目前为止,一些受访者似乎并不理解集合和有界多集之间的区别。有界多重集具有重复元素。例如,{a,a,b,b,b,c}是有界多重集,在我的算法中将被编码为{3,2,3,1}。请注意,前导“3”是集合中的项目类型(唯一项目)的数量。如果您提供算法,则以下测试应生成如下所示的输出。

    private static void combination_multiset_test(){
        int[] aiItems = { 4, 3, 2, 1, 1 };
        int iSlots = 4;
        java.util.Iterator<int[]> iterator = combination( aiItems, iSlots );
        if( iterator == null ){
            System.out.println( "null" );
            System.exit( -1 );
        }
        int xCombination = 0;
        while( iterator.hasNext() ){
            xCombination++;
            int[] combination = iterator.next();
            if( combination == null ){
                System.out.println( "improper termination, no result" );
                System.exit( -1 );
            }
            System.out.println( xCombination + ": " + Arrays.toString( combination ) );
        }
        System.out.println( "complete" );
    }


1: [4, 1, 1, 1, 2]
2: [4, 1, 1, 1, 3]
3: [4, 1, 1, 1, 4]
4: [4, 1, 1, 2, 2]
5: [4, 1, 1, 2, 3]
6: [4, 1, 1, 2, 4]
7: [4, 1, 1, 3, 4]
8: [4, 1, 2, 2, 3]
9: [4, 1, 2, 2, 4]
10: [4, 1, 2, 3, 4]
11: [4, 2, 2, 3, 4]
complete

3 个答案:

答案 0 :(得分:1)

我写了一个简单的助手类,它有incrementhighbitfor_each_bit

我首先打包unsigned int,并将其限制为32位,如果我有野心的话,可以将其延长std::bitsetstd::vector<uint32_t> - 但是启动3种方法,我可以对其进行测试并使其正常工作。

increment很简单,特别是裸32位int。

highbit返回最高设置位的位位置。

for_each_bit在C ++中有这个签名:

template<typename Lambda>
void for_each_bit( my_bignum const& num, Lambda&& func )

并使用func中每个设置位的索引调用num

最多只需几分钟即可写完。

丢弃hasNext,遵循迭代器概念 - 您有begin子集和end子集,而end无法提取值。取消引用这些迭代器会产生有问题的子集(或为所述子集生成工厂)。

end现在很容易解决 - 如果highbit是&gt; =您的集合中的元素数量,那么您已经过了排列的结尾。

begin为零或1,具体取决于您是否要包含空子集。

next只会递增bignum

生成子集只需要调用for_each_bit,并将您的集合中的项目放入子集中。

接下来,改进increment以允许随机访问,然后您可以实现并行迭代子集!

这解决了设定问题。要解决mutltiset问题,首先要解决派生集问题(假设每个元素只有0或1),并迭代它。然后,在派生集的每次迭代中,构建每个元素的最大计数std::vector

然后做这样的事情:

#include <utility>
#include <cstddef>
#include <vector>

using std::size_t;

namespace details {
template<typename Lambda>
  void for_each_multiset_combo_worker( std::vector<size_t> const& counts, Lambda&& lambda, std::vector<size_t>& indexes, std::vector<size_t>& current )
  {
    if (depth >= counts.size()) {
      lambda( current );
      return;
    }
    for (size_t i = 0; i <= counts[depth]; ++i) {
      // Assert: current.size() == depth
      current.push_back(i);
      // Assert: current.back() == i
      // Assert: current.size() == dpeth+1
      for_each_multiset_combo_worker( counts, lambda, depth+1, current );
      // Assert: current.back() == i
      // Assert: current.size() == dpeth+1
      current.pop_back();
      // Assert: current.size() == depth
    }
  }
}
template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
  std::vector<size_t> current;
  current.reserve( counts.size() );
  details::for_each_multiset_combo_worker( counts, std::forward<Lambda>(lambda), 0, current );
}
#include <iostream>

int main() {
  std::vector<size_t> multiset = {3, 2, 1, 1};
  size_t counter = 0;
  for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
    std::cout << counter << ": [";
    for(auto it = counts.begin(); it != counts.end(); ++it) {
      if (it != counts.begin()) {
        std::cout << ", ";
      }
      std::cout << *it;
    }
    std::cout << "]\n";
    ++counter;
  });
}

实例:http://ideone.com/8GN1xx

在这个实例中,我首先跳过了进行集迭代的优化,而是直接遍历multiset。

(限制:不超过每种类型的最大size_t元素,且不超过std::vector种不同类型元素的最大容量。)

我不需要领先的“multiset中不同元素的数量”,所以我没有使用它。

以下是上述递归算法的迭代版本,使用通常的“将隐式递归堆栈转换为显式迭代堆栈”技术:

#include <utility>
#include <cstddef>
#include <vector>

using std::size_t;

template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
  // below code is easier if I assume counts is non-empty:
  if (counts.empty())
  {
    lambda(counts);
    return;
  }
  // preallocate a buffer big enough to hold the output counts:
  std::vector<size_t> indexes;
  indexes.reserve( counts.size() );
  while(true) {
    // append 0s on the end of indexes if we have room:
    while (indexes.size() < counts.size()) {
      indexes.push_back(0);
    }
    // at this point, we have a unique element.  Pass it to the passed in lambda:
    lambda( indexes );
    // The advancement logic.  Advance the highest index.  If that overflows, pop it and
    // advance the next highest index:
    indexes.back()++;
    while (indexes.back() > counts[indexes.size()-1]) {
      indexes.pop_back();
      // we are done if we have managed to advance every index, and there are none left to advance:
      if (indexes.empty())
        return; // finished
      indexes.back()++;
    }
  }
}
#include <iostream>

int main() {
  std::vector<size_t> multiset = {3, 2, 1, 1};
  size_t counter = 0;
  for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
    std::cout << counter << ": [";
    for(auto it = counts.begin(); it != counts.end(); ++it) {
      if (it != counts.begin()) {
        std::cout << ", ";
      }
      std::cout << *it;
    }
    std::cout << "]\n";
    ++counter;
  });
}

http://ideone.com/x2Zp2f

答案 1 :(得分:1)

编辑:根据澄清的问题完成调整答案

主要思想:再次,结果选择可以编码为类似于自定义numeral system。可以增加一个计数器并将该计数器解释为选择。

但是,由于选择== target的大小还有其他限制。 实现限制的一种简单方法是只检查结果选择的大小,并跳过不满足限制的选择。但那很慢。

所以我所做的只是做一个更聪明的增量跳转到 选择正确的大小。

对不起,代码是用Python编写的。 但我这样做的方式与Java迭代器接口相当。 输入&amp;输出格式为:

haves[i] := multiplicity of the i-th item in the collection
target := output collection must have this size

代码:

class Perm(object):
    def __init__(self,items,haves,target):
        assert sum(haves) >= target
        assert all(h > 0 for h in haves)
        self.items = items
        self.haves = haves
        self.target = target
        self.ans = None
        self.stop = False
    def __iter__(self):
        return self
    def reset(self):
        self.ans = [0]*len(self.haves)
        self.__fill(self.target)
        self.stop = False
    def __fill(self,n):
        """fill ans from LSB with n bits"""
        if n <= 0: return
        i = 0
        while n > self.haves[i]:
            assert self.ans[i] == 0
            self.ans[i] = self.haves[i]
            n -= self.haves[i]
            i += 1
        assert self.ans[i] == 0
        self.ans[i] = n
    def __inc(self):
        """increment from LSB, carry when 'target' or 'haves' constrain is broken"""
        # in fact, the 'target' constrain is always broken on the left most non-zero entry
        # find left most non-zero
        i = 0
        while self.ans[i] == 0:
            i += 1
        # set it to zero
        l = self.ans[i]
        self.ans[i] = 0
        # do increment answer, and carry
        while True:
            # increment to the next entry, if possible
            i += 1
            if i >= len(self.ans):
                self.stop = True
                raise StopIteration
            #
            if self.ans[i] == self.haves[i]:
                l += self.ans[i]
                self.ans[i] = 0
            else:
                l -= 1
                self.ans[i] += 1
                break
        return l
    def next(self):
        if self.stop:
            raise StopIteration
        elif self.ans is None:
            self.reset()
        else:
            l = self.__inc()
            self.__fill(l)
        return self.ans

请注意,items参数并未真正使用。

assert中的__init__是为了明确我对输入的假设。

assert中的__fill只是在self.ans被调用的上下文中显示__fill的便捷属性。

这是测试代码的一个很好的框架:

test_cases = [([3,2,1], 3),
              ([3,2,1], 5),
              ([3,2,1], 6),
              ([4,3,2,1,1], 4),
              ([1,3,1,2,4], 4),
             ]

P = Perm(None,*test_cases[-1])
for p in P:
    print p
    #raw_input()

输入([1,3,1,2,4], 4)的示例结果:

[1, 3, 0, 0, 0]
[1, 2, 1, 0, 0]
[0, 3, 1, 0, 0]
[1, 2, 0, 1, 0]
[0, 3, 0, 1, 0]
[1, 1, 1, 1, 0]
[0, 2, 1, 1, 0]
[1, 1, 0, 2, 0]
[0, 2, 0, 2, 0]
[1, 0, 1, 2, 0]
[0, 1, 1, 2, 0]
[1, 2, 0, 0, 1]
[0, 3, 0, 0, 1]
[1, 1, 1, 0, 1]
[0, 2, 1, 0, 1]
[1, 1, 0, 1, 1]
[0, 2, 0, 1, 1]
[1, 0, 1, 1, 1]
[0, 1, 1, 1, 1]
[1, 0, 0, 2, 1]
[0, 1, 0, 2, 1]
[0, 0, 1, 2, 1]
[1, 1, 0, 0, 2]
[0, 2, 0, 0, 2]
[1, 0, 1, 0, 2]
[0, 1, 1, 0, 2]
[1, 0, 0, 1, 2]
[0, 1, 0, 1, 2]
[0, 0, 1, 1, 2]
[0, 0, 0, 2, 2]
[1, 0, 0, 0, 3]
[0, 1, 0, 0, 3]
[0, 0, 1, 0, 3]
[0, 0, 0, 1, 3]
[0, 0, 0, 0, 4]

效果每次next()来电都需要O(h),其中h是项目类型的数量(haves列表的大小)。

答案 2 :(得分:0)

This paper提供了一种有效的迭代算法,用于在第8页上生成多集排列

This paper提供了另一种迭代算法,也是第8页