如何从java中的一组大小n迭代生成k个元素子集?

时间:2010-12-21 23:38:50

标签: java subset

我正在研究一个难题,包括分析所有大小的k子集并找出哪一个是最优的。我写了一个解决方案,当子集的数量很少时可以工作,但是对于更大的问题,它会耗尽内存。现在我正在尝试将用python编写的迭代函数转换为java,以便我可以在创建时分析每个子集,只获得表示它是如何优化的值,而不是整个集合,这样我就不会用完记忆。这是我到目前为止所做的事情,即使是非常小的问题也似乎没有完成:

public static LinkedList<LinkedList<Integer>> getSets(int k, LinkedList<Integer> set)
{
    int N = set.size();
    int maxsets = nCr(N, k);
    LinkedList<LinkedList<Integer>> toRet = new LinkedList<LinkedList<Integer>>();

    int remains, thresh;
    LinkedList<Integer> newset; 
    for (int i=0; i<maxsets; i++)
    {
        remains = k;
        newset = new LinkedList<Integer>();
        for (int val=1; val<=N; val++)
        {
            if (remains==0)
                break;
            thresh = nCr(N-val, remains-1);
            if (i < thresh)
            {
                newset.add(set.get(val-1));
                remains --;
            }
            else 
            {
                i -= thresh;
            }
        }
        toRet.add(newset);
    }

    return toRet;

}

任何人都可以帮我调试这个函数或建议另一种算法来迭代生成大小为k的子集吗?

编辑:我终于让这个函数工作了,我不得不创建一个新的变量,我和i做了相同的th和thresh比较,因为python处理循环索引的方式不同。

7 个答案:

答案 0 :(得分:27)

首先,如果您打算对列表进行随机访问,则应选择一个有效支持该列表的列表实现。来自LinkedList上的javadoc:

  

所有操作都表现为可以预期的双重链接   名单。索引到列表中的操作将遍历列表   开头或结尾,以较接近指定的指数为准。

ArrayList的空间效率更高,随机访问速度更快。实际上,既然你事先知道了长度,你甚至可以使用普通数组。

算法:让我们开始简单:如何生成大小为1的所有子集?可能是这样的:

for (int i = 0; i < set.length; i++) {
    int[] subset = {i};
    process(subset);
}

where process是一个对集合执行某些操作的方法,例如检查它是否比目前处理的所有子集“更好”。

现在,您如何扩展它以适应大小为2的子集?大小为2的子集与大小为1的子集之间的关系是什么?那么,通过移除其最大元素,可以将大小为2的任何子集转换为大小为1的子集。换句话说,可以通过获取大小为1的子集并添加大于该组中所有其他元素的新元素来生成大小为2的每个子集。在代码中:

processSubset(int[] set) {
    int subset = new int[2];
    for (int i = 0; i < set.length; i++) {
        subset[0] = set[i];
        processLargerSets(set, subset, i);
    }
}

void processLargerSets(int[] set, int[] subset, int i) {
    for (int j = i + 1; j < set.length; j++) {
        subset[1] = set[j];
        process(subset);
    }
}

对于任意大小为k的子集,观察到大小为k的任何子集可以通过斩波最大元素变成大小为k-1的子集。也就是说,可以通过生成大小为k-1的所有子集来生成大小为k的所有子集,并且对于这些子集中的每一个,并且每个值大于子集中的最大值,将该值添加到该集合。在代码中:

static void processSubsets(int[] set, int k) {
    int[] subset = new int[k];
    processLargerSubsets(set, subset, 0, 0);
}

static void processLargerSubsets(int[] set, int[] subset, int subsetSize, int nextIndex) {
    if (subsetSize == subset.length) {
        process(subset);
    } else {
        for (int j = nextIndex; j < set.length; j++) {
            subset[subsetSize] = set[j];
            processLargerSubsets(set, subset, subsetSize + 1, j + 1);
        }
    }
}

测试代码:

static void process(int[] subset) {
    System.out.println(Arrays.toString(subset));
}


public static void main(String[] args) throws Exception {
    int[] set = {1,2,3,4,5};
    processSubsets(set, 3);
}

但是在大​​集合上调用它之前,请记住子集的数量可以相当快地增长。

答案 1 :(得分:7)

你可以使用 org.apache.commons.math3.util.Combinations

示例:

import java.util.Arrays;
import java.util.Iterator;

import org.apache.commons.math3.util.Combinations;

public class tmp {
    public static void main(String[] args) {
        for (Iterator<int[]> iter = new Combinations(5, 3).iterator(); iter.hasNext();) {
            System.out.println(Arrays.toString(iter.next()));
        }
    }

}

输出:     [0,1,2]     [0,1,3]     [0,2,3]     [1,2,3]     [0,1,4]     [0,2,4]     [1,2,4]     [0,3,4]     [1,3,4]     [2,3,4]

答案 2 :(得分:3)

这是我最近写的一个组合迭代器

package psychicpoker;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;

import static com.google.common.base.Preconditions.checkArgument;

public class CombinationIterator<T> implements Iterator<Collection<T>> {

private int[] indices;
private List<T> elements;
private boolean hasNext = true;

public CombinationIterator(List<T> elements, int k) throws IllegalArgumentException {
    checkArgument(k<=elements.size(), "Impossible to select %d elements from hand of size %d", k, elements.size());
    this.indices = new int[k];
    for(int i=0; i<k; i++)
        indices[i] = k-1-i;
    this.elements = elements;
}

public boolean hasNext() {
    return hasNext;
}

private int inc(int[] indices, int maxIndex, int depth) throws IllegalStateException {
    if(depth == indices.length) {
        throw new IllegalStateException("The End");
    }
    if(indices[depth] < maxIndex) {
        indices[depth] = indices[depth]+1;
    } else {
        indices[depth] = inc(indices, maxIndex-1, depth+1)+1;
    }
    return indices[depth];
}

private boolean inc() {
    try {
        inc(indices, elements.size() - 1, 0);
        return true;
    } catch (IllegalStateException e) {
        return false;
    }
}

public Collection<T> next() {
    Collection<T> result = new ArrayList<T>(indices.length);
    for(int i=indices.length-1; i>=0; i--) {
        result.add(elements.get(indices[i]));
    }
    hasNext = inc();
    return result;
}

public void remove() {
    throw new UnsupportedOperationException();
}

}

答案 3 :(得分:3)

我今天遇到了同样的问题,即生成 n大小集的所有 k大小子集。

我有一个用Haskell编写的递归算法,但问题是我用Java编写了一个新版本。
在Java中,我想我可能不得不使用memoization来优化递归。事实证明,我找到了一种迭代方法。我受到来自维基百科的this image的启发,关于组合的文章。

计算所有 k大小的子集的方法

public static int[][] combinations(int k, int[] set) {
    // binomial(N, K)
    int c = (int) binomial(set.length, k);
    // where all sets are stored
    int[][] res = new int[c][Math.max(0, k)];
    // the k indexes (from set) where the red squares are
    // see image above
    int[] ind = k < 0 ? null : new int[k];
    // initialize red squares
    for (int i = 0; i < k; ++i) { ind[i] = i; }
    // for every combination
    for (int i = 0; i < c; ++i) {
        // get its elements (red square indexes)
        for (int j = 0; j < k; ++j) {
            res[i][j] = set[ind[j]];
        }
        // update red squares, starting by the last
        int x = ind.length - 1;
        boolean loop;
        do {
            loop = false;
            // move to next
            ind[x] = ind[x] + 1;
            // if crossing boundaries, move previous
            if (ind[x] > set.length - (k - x)) {
                --x;
                loop = x >= 0;
            } else {
                // update every following square
                for (int x1 = x + 1; x1 < ind.length; ++x1) {
                    ind[x1] = ind[x1 - 1] + 1;
                }
            }
        } while (loop);
    }
    return res;
}

二项式的方法:
(改编自Python示例,来自维基百科)

private static long binomial(int n, int k) {
    if (k < 0 || k > n) return 0;
    if (k > n - k) {    // take advantage of symmetry
        k = n - k;
    }
    long c = 1;
    for (int i = 1; i < k+1; ++i) {
        c = c * (n - (k - i));
        c = c / i;
    }
    return c;
}

当然,组合总会出现空间问题,因为它们可能会爆炸 在我自己的问题的背景下,最大可能的是大约2,000,000个子集。我的机器在1032毫秒内计算出来了。

答案 4 :(得分:2)

受afsantos的回答启发: - )...我决定编写一个C#.NET实现来生成一整套特定大小的所有子集组合。它不需要计算可能子集的总数;它检测到它何时到达终点。这是:

public static List<object[]> generateAllSubsetCombinations(object[] fullSet, ulong subsetSize) {
    if (fullSet == null) {
        throw new ArgumentException("Value cannot be null.", "fullSet");
    }
    else if (subsetSize < 1) {
        throw new ArgumentException("Subset size must be 1 or greater.", "subsetSize");
    }
    else if ((ulong)fullSet.LongLength < subsetSize) {
        throw new ArgumentException("Subset size cannot be greater than the total number of entries in the full set.", "subsetSize");
    }

    // All possible subsets will be stored here
    List<object[]> allSubsets = new List<object[]>();

    // Initialize current pick; will always be the leftmost consecutive x where x is subset size
    ulong[] currentPick = new ulong[subsetSize];
    for (ulong i = 0; i < subsetSize; i++) {
        currentPick[i] = i;
    }

    while (true) {
        // Add this subset's values to list of all subsets based on current pick
        object[] subset = new object[subsetSize];
        for (ulong i = 0; i < subsetSize; i++) {
            subset[i] = fullSet[currentPick[i]];
        }
        allSubsets.Add(subset);

        if (currentPick[0] + subsetSize >= (ulong)fullSet.LongLength) {
            // Last pick must have been the final 3; end of subset generation
            break;
        }

        // Update current pick for next subset
        ulong shiftAfter = (ulong)currentPick.LongLength - 1;
        bool loop;
        do {
            loop = false;

            // Move current picker right
            currentPick[shiftAfter]++;

            // If we've gotten to the end of the full set, move left one picker
            if (currentPick[shiftAfter] > (ulong)fullSet.LongLength - (subsetSize - shiftAfter)) {
                if (shiftAfter > 0) {
                    shiftAfter--;
                    loop = true;
                }
            }
            else {
                // Update pickers to be consecutive
                for (ulong i = shiftAfter+1; i < (ulong)currentPick.LongLength; i++) {
                    currentPick[i] = currentPick[i-1] + 1;
                }
            }
        } while (loop);
    }

    return allSubsets;
}

答案 5 :(得分:2)

solution对我有用:

 private static void findSubsets(int array[])
    {
      int numOfSubsets = 1 << array.length; 

      for(int i = 0; i < numOfSubsets; i++)
     {
        int pos = array.length - 1;
       int bitmask = i;

       System.out.print("{");
       while(bitmask > 0)
       {
        if((bitmask & 1) == 1)
         System.out.print(array[pos]+",");
        bitmask >>= 1;
        pos--;
       }
       System.out.print("}");
     }
    }

答案 6 :(得分:0)

快速实施:

以下是afsantos提供的答案的两个变体。

combinations函数的第一个实现镜像了原始Java实现的功能。

第二种实现是查找集合k[0, setSize)值的所有组合的一般情况。如果这真的是你所需要的,那么这种实现将会更有效率。

此外,它们包括一些小的优化和smidgin逻辑简化。

/// Calculate the binomial for a set with a subset size
func binomial(setSize: Int, subsetSize: Int) -> Int
{
    if (subsetSize <= 0 || subsetSize > setSize) { return 0 }

    // Take advantage of symmetry
    var subsetSizeDelta = subsetSize
    if (subsetSizeDelta > setSize - subsetSizeDelta)
    {
        subsetSizeDelta = setSize - subsetSizeDelta
    }

    // Early-out
    if subsetSizeDelta == 0 { return 1 }

    var c = 1
    for i in 1...subsetSizeDelta
    {
        c = c * (setSize - (subsetSizeDelta - i))
        c = c / i
    }
    return c
}

/// Calculates all possible combinations of subsets of `subsetSize` values within `set`
func combinations(subsetSize: Int, set: [Int]) -> [[Int]]?
{
    // Validate inputs
    if subsetSize <= 0 || subsetSize > set.count { return nil }

    // Use a binomial to calculate total possible combinations
    let comboCount = binomial(setSize: set.count, subsetSize: subsetSize)
    if comboCount == 0 { return nil }

    // Our set of combinations
    var combos = [[Int]]()
    combos.reserveCapacity(comboCount)

    // Initialize the combination to the first group of set indices
    var subsetIndices = [Int](0..<subsetSize)

    // For every combination
    for _ in 0..<comboCount
    {
        // Add the new combination
        var comboArr = [Int]()
        comboArr.reserveCapacity(subsetSize)
        for j in subsetIndices { comboArr.append(set[j]) }
        combos.append(comboArr)

        // Update combination, starting with the last
        var x = subsetSize - 1
        while true
        {
            // Move to next
            subsetIndices[x] = subsetIndices[x] + 1

            // If crossing boundaries, move previous
            if (subsetIndices[x] > set.count - (subsetSize - x))
            {
                x -= 1
                if x >= 0 { continue }
            }
            else
            {
                for x1 in x+1..<subsetSize
                {
                    subsetIndices[x1] = subsetIndices[x1 - 1] + 1
                }
            }
            break
        }
    }
    return combos
}


/// Calculates all possible combinations of subsets of `subsetSize` values within a set
/// of zero-based values for the set [0, `setSize`)
func combinations(subsetSize: Int, setSize: Int) -> [[Int]]?
{
    // Validate inputs
    if subsetSize <= 0 || subsetSize > setSize { return nil }

    // Use a binomial to calculate total possible combinations
    let comboCount = binomial(setSize: setSize, subsetSize: subsetSize)
    if comboCount == 0 { return nil }

    // Our set of combinations
    var combos = [[Int]]()
    combos.reserveCapacity(comboCount)

    // Initialize the combination to the first group of elements
    var subsetValues = [Int](0..<subsetSize)

    // For every combination
    for _ in 0..<comboCount
    {
        // Add the new combination
        combos.append([Int](subsetValues))

        // Update combination, starting with the last
        var x = subsetSize - 1
        while true
        {
            // Move to next
            subsetValues[x] = subsetValues[x] + 1

            // If crossing boundaries, move previous
            if (subsetValues[x] > setSize - (subsetSize - x))
            {
                x -= 1
                if x >= 0 { continue }
            }
            else
            {
                for x1 in x+1..<subsetSize
                {
                    subsetValues[x1] = subsetValues[x1 - 1] + 1
                }
            }
            break
        }
    }
    return combos
}