我们如何计算多集的r组合?

时间:2018-11-12 21:39:24

标签: c algorithm math

问题

multiset M的 split 定义为相等大小的多集的有序对,其并集为M。

我们如何计算多组的分裂?

讨论

此问题源自this one

此问题等同于计算具有2个 r 个元素的多重集的 r 个组合的数量,有关解决方案,请参见的6.2节理查德·A·布鲁亚迪(Richard A. Brualdi)的介绍性组合学(Emtroductory Combinatorics),第二版,1992年,Prentice-Hall,Inc。要看到它们是相同的,请观察任意 r -组合X之间存在1-1对应关系。 M由2个 r 个元素组成的多集和拆分(X,Y),其中Y是M中不在X中的所有元素,即Y = MX。

Brualdi使用inclusion-exclusion principle的解决方案是数学家众所周知的,并且适合算法实现。

尽管它是数学家众所周知的事实,但here at Mathematics Stack Exchangehere at Quora仅得到了部分回答。

3 个答案:

答案 0 :(得分:1)

这是一个解决方案。我已经在评论中包含了解释。

#include <inttypes.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>


typedef uintmax_t UInt;
#define UIntFormat PRIuMAX

#define NumberOf(a) (sizeof (a) / sizeof *(a))


/*  In this code, a multiset is represented with:

        an integer N0 which is the number of types of elements in the set,

        an integer N1 which is the number of types of elements in the set that
        each appear finitely many times (N0-N1 types each appear infinitely
        many times), and

        an array M[] in which each M[i] is the number of times that an element
        of type i appears.

    Collectively, this is referred to as the multiset M.
*/


/*  Return the number of ways to choose r things from n things.  This is
    n! / (r! * (n-r)!).
*/
static UInt Choose(UInt r, UInt n)
{
    UInt result = 1;
    for (UInt i = 1; i <= r; ++i)
        result = result * n-- / i;
    return result;
}


//  Count the number of r-combinations of a multiset.
static UInt CountRCombinations(UInt r, size_t N0, size_t N1, UInt M[])
{
    /*  If we have only the unlimited types, there is a one-to-one
        correspondence between r objects with N0-1 dividers placed between
        them, each divider marking a transition from one type to another.  For
        example, consider four objects of three types.  Below "o" represents
        any object, and "|" is a divider.  For each arrangement of four o's and
        two |s, we show how it defines a selection of four objects of three
        types:

            oooo|| -> aaaa||
            ooo|o| -> aaa|b|
            ooo||o -> aaa||c
            oo|oo| -> aa|bb|
            oo|o|o -> aa|b|c
            oo||oo -> aa||cc
            o|ooo| -> a|bbb|
            o|oo|o -> a|bb|c
            o|o|oo -> a|b|cc
            o||ooo -> a||ccc
            |oooo| -> |bbbb|
            |ooo|o -> |bbb|c
            |oo|oo -> |bb|cc
            |o|ooo -> |b|ccc
            ||oooo -> ||cccc

        Therefore, the number of combinations equals the number of ways to
        arrange r indistinguishable objects of one type with N0-1
        indistinguishable objects of a different type.
    */
    if (N1 == 0)
        return Choose(r, r+N0-1);

    /*  Otherwise, we count the combinations:

            Select one of the limited types (we use the last one, N1-1, because
            it is easily removed from the array simply by reducing the size of
            the array).

            Count the number of combinations there would be if that type were
            unlimited.

            Count the number of combinations there would be if there were at
            least M[i]+1 instances of elements of that type.

            Subtract to get the number of combinations that have 0 to M[i]
            instances of elements of that type.
    */
    else
    {
        /*  Let M' be the multiset M with the last type changed to unlimited.

            So, where M is represented with N0, N1, M[], M' is represented with
            N0, N1-1, M[].
        */

        //  Change the last limited type to unlimited.
        N1 -= 1;

        //  Count the r-combinations drawn from M'.
        UInt C = CountRCombinations(r, N0, N1, M);

        /*  Now we count the combinations which have at least M[N1]+1 instances
            of the (formerly) last type.

            Consider that each such combination has M[N1]+1 instances of that
            type plus some combination of r - (M[N1]+1) elements drawn from M',
            including zero or more instances of the last type.  (Note that if r
            <= M[N1], there are no such combinations, since we would be asking
            for a negative number of elements.)

            So the number of combinations which have at least M[N1]+1 instances
            of the last type equals the number of combinations of that type
            plus some combination of r - (M[N1]+1) elements drawn from M'.
        */
        if (M[N1] < r)
            C -= CountRCombinations(r - (M[N1] + 1), N0, N1, M);

        return C;
    }
}


//  Count the number of splits of a multiset M that contains N types of things.
static UInt CountSplits(size_t N, UInt M[])
{
    //  Count the number of elements.
    UInt T = 0;
    for (size_t i = 0; i < N; ++i)
        T += M[i];

    //  Return the number of T/2-combinations of M.
    return T % 2 ? 0 : CountRCombinations(T/2, N, N, M);
}


int main(void)
{
    UInt M[] = { 3, 4, 5 };
    size_t N = NumberOf(M);

    printf("The number of splits of {");
    for (size_t i = 0; i < N; ++i) printf(" %" UIntFormat, M[i]);
    printf(" } is %" UIntFormat ".\n", CountSplits(N, M));
}

答案 1 :(得分:1)

当然,原始问题是同构的,其中的问题是对 r -大小为2 r 的多集的组合进行计数。

作为包含-排除公式的替代方法(它肯定具有分析价值,但可能具有较小的算法价值),我们可以构造一个递归解决方案,计算 k -组合的计数 k 的所有值的集合,使用与递归算法非常相似的递归方法来计算二项式系数C( n k ),则是一组 n 个元素的 k 个组合的计数。

假设我们将多集表示为大小为 n 的排序向量 V (其中 n = 2 r )。 (从技术上讲,不必对其进行排序;对其进行“聚集”就足够了,以便所有相同的元素都是连续的。但是,最简单的方法是对向量​​进行排序。)我们想产生此向量的所有唯一 k -组合。所有此类组合均具有以下两种形式之一:

  • “选择第一个元素”。组合以 V 1 开头,并以( V 的( k -1)组合继续> 2 V 3 ,…, V n

  • “跳过第一个元素”。组合是( V i V k 组合> i +1 ,... V n ),其中 i 是使 V i V 1 < / sub>。 (为了避免重复,我们需要跳过与第一个元素相同的 all 个元素。)

这里与二项式递归唯一的区别是第二个选项中使用了索引 i 。如果集合中没有重复的元素,则减少为 i = 2,从而得出递归C( n k )= C(< em> n − 1, k − 1)+ C( n − 1, k )。

这个递归的幼稚实现将花费指数时间,因为每个计算都需要两个递归调用。但是,唯一调用只有二次数量,因此可以通过备忘录或动态编程将计算减少到二次时间。下面的解决方案使用动态编程,因为这仅需要线性空间。 (记忆化需要二次空间,因为存在子问题的二次数。)

动态编程解决方案通过计算向量的连续后缀的 k 个组合的数量来反转递归。它仅需要保留两个后缀的值:前一个后缀和前一个后缀具有不同的第一个元素,对应于上述递归的第一个和第二个选项所需的计数。 (实际代码使用前缀而不是后缀,但这绝对没有区别。)

作为次要优化,我们仅计算 k 组合的计数,其范围为0和⌈ n /2⌉之间的 k 值。与二项式系数一样,计数是对称的: k -组合的数量等于( n - k )-组合的数量因为每个 k 组合都对应一个由所有未选择的元素组成的唯一( n - k )组合。基于最后只需要一个计数的事实,可以进行其他优化,但是附加的复杂性只会使基本算法模糊。

解为O( n 2 )的事实有点令人讨厌,但由于 n 通常是一个小数字(否则计数将是天文数字)计算时间似乎是合理的。毫无疑问,还有可能进一步优化,而且有可能存在次二次算法。

这是C语言的基本实现(使用字符串数组):

/* Given a *sorted* vector v of size n, compute the number of unique k-combinations
 * of the elements of the vector for values of k between 0 and (n/2)+1.
 * The counts are stored in the vector r, which must be large enough.
 * Counts for larger values of k can be trivially looked up in the returned
 * vector, using the identity r[k] == r[n - k].
 * If v is not sorted, the result will be incorrect. The function does not
 * check for overflow, but the results will be correct modulo (UINTMAX + 1)
 */
void multicomb(const char** v, size_t n, uintmax_t* r) {
  size_t lim = n / 2 + 1;
  // Prev retains the counts for the previous prefix ending with
  // a different element
  uintmax_t prev[lim];
  // If there are no elements, there is 1 0-combination and no other combinations.
  memset(r, 0, sizeof prev);
  r[0] = 1;
  // Iterate over the unique elements of v
  for (size_t k = 0; k < n; ) {
    // Save the counts for this prefix
    memcpy(prev, r, sizeof prev);
    // Iterate over the prefixes with the same ending value
    do {
      for (size_t i = lim - 1; i > 0; --i)
        r[i] = r[i - 1] + prev[i];
    } while (++k < n && strcmp(v[k - 1], v[k]) == 0);
  };
}

与OP的自助式解决方案相比,该版本:

  • 稍后会溢出,因为它仅取决于加法。 (不存在除法的事实也使得以模为模的计数更容易计算。)
  • 采用二次时间而不是指数时间。

答案 2 :(得分:0)

埃里克,我认为那是不对的。最初的问题是,如果我的源数组包含A,A,B,B,那么可以在两个接收者之间唯一地拆分它的方式-在这种情况下,答案是三种,因为可以按以下方式拆分数组:

Child Array 1 | Child Array 2
-----------------------------
A A           | B B
A B           | A B
B B           | A A

此源代码(C ++函数)可以运行,但速度慢且效率低下。蛮力和无知。

int countPermutations(vector<int> itemset) {

  vector<vector<int> > permutations;

  do {
    vector<int> match;

    for(int i = 0; i < itemset.size(); i+=2) {
         match.push_back(itemset.at(i));
    }
    sort(match.begin(), match.end());

    int j = 0;
    bool found = false;
    while (!found && (j < permutations.size())) {
        found = (match == permutations.at(j))?true:false;
        j++;
    }
    if (!found) {
        permutations.push_back(match);
    }

  } while (next_permutation(itemset.begin(), itemset.end()));

  return permutations.size();
}

对优化有任何想法吗?