一切都在标题中.. =)
我想快速创建一个powerset,只有长度在3(最小常数长度)和最大值之间的子集。这个最大值应该是大多数时间的5,但我希望它变量(从3到5,6)。 原始集可能很大。我需要存储这些子集以供进一步处理。
我见过Obtaining a powerset of a set in Java,但在这里他们生成了电源组的每个子集。 我也看过efficient powerset algorithm for subsets of minimal length,对于C#,但我认为,正如Adam S所说,我将遇到低运行时间性能和内存问题。
我想将这些想法合并到一个理想的想法中。我最后的希望是快速生成每个子集(可能使用Guava中的算法)并迭代以获取所需的长度...但即使读它也很难看;)
有人有想法吗?
答案 0 :(得分:4)
基本上,当我遍历控制集合生成的位数组时,我检查确保位数在最小值和最大值之间。
这是代码。
import java.util.BitSet;
import java.util.Iterator;
import java.util.Set;
import java.util.TreeSet;
public class PowerSet<E> implements Iterator<Set<E>>, Iterable<Set<E>> {
private int minSize;
private int maxSize;
private E[] arr = null;
private BitSet bset = null;
@SuppressWarnings("unchecked")
public PowerSet(Set<E> set, int minSize, int maxSize) {
this.minSize = Math.min(minSize, set.size());
this.maxSize = Math.min(maxSize, set.size());
arr = (E[]) set.toArray();
bset = new BitSet(arr.length + 1);
for (int i = 0; i < minSize; i++) {
bset.set(i);
}
}
@Override
public boolean hasNext() {
return !bset.get(arr.length);
}
@Override
public Set<E> next() {
Set<E> returnSet = new TreeSet<E>();
// System.out.println(printBitSet());
for (int i = 0; i < arr.length; i++) {
if (bset.get(i)) {
returnSet.add(arr[i]);
}
}
int count;
do {
incrementBitSet();
count = countBitSet();
} while ((count < minSize) || (count > maxSize));
// System.out.println(returnSet);
return returnSet;
}
protected void incrementBitSet() {
for (int i = 0; i < bset.size(); i++) {
if (!bset.get(i)) {
bset.set(i);
break;
} else
bset.clear(i);
}
}
protected int countBitSet() {
int count = 0;
for (int i = 0; i < bset.size(); i++) {
if (bset.get(i)) {
count++;
}
}
return count;
}
protected String printBitSet() {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < bset.size(); i++) {
if (bset.get(i)) {
builder.append('1');
} else {
builder.append('0');
}
}
return builder.toString();
}
@Override
public void remove() {
throw new UnsupportedOperationException("Not Supported!");
}
@Override
public Iterator<Set<E>> iterator() {
return this;
}
public static void main(String[] args) {
Set<Character> set = new TreeSet<Character>();
for (int i = 0; i < 7; i++)
set.add((char) (i + 'A'));
PowerSet<Character> pset = new PowerSet<Character>(set, 3, 5);
int count = 1;
for (Set<Character> s : pset) {
System.out.println(count++ + ": " + s);
}
}
}
以下是测试结果:
1: [A, B, C]
2: [A, B, D]
3: [A, C, D]
4: [B, C, D]
5: [A, B, C, D]
6: [A, B, E]
7: [A, C, E]
8: [B, C, E]
9: [A, B, C, E]
10: [A, D, E]
11: [B, D, E]
12: [A, B, D, E]
13: [C, D, E]
14: [A, C, D, E]
15: [B, C, D, E]
16: [A, B, C, D, E]
17: [A, B, F]
18: [A, C, F]
19: [B, C, F]
20: [A, B, C, F]
21: [A, D, F]
22: [B, D, F]
23: [A, B, D, F]
24: [C, D, F]
25: [A, C, D, F]
26: [B, C, D, F]
27: [A, B, C, D, F]
28: [A, E, F]
29: [B, E, F]
30: [A, B, E, F]
31: [C, E, F]
32: [A, C, E, F]
33: [B, C, E, F]
34: [A, B, C, E, F]
35: [D, E, F]
36: [A, D, E, F]
37: [B, D, E, F]
38: [A, B, D, E, F]
39: [C, D, E, F]
40: [A, C, D, E, F]
41: [B, C, D, E, F]
42: [A, B, G]
43: [A, C, G]
44: [B, C, G]
45: [A, B, C, G]
46: [A, D, G]
47: [B, D, G]
48: [A, B, D, G]
49: [C, D, G]
50: [A, C, D, G]
51: [B, C, D, G]
52: [A, B, C, D, G]
53: [A, E, G]
54: [B, E, G]
55: [A, B, E, G]
56: [C, E, G]
57: [A, C, E, G]
58: [B, C, E, G]
59: [A, B, C, E, G]
60: [D, E, G]
61: [A, D, E, G]
62: [B, D, E, G]
63: [A, B, D, E, G]
64: [C, D, E, G]
65: [A, C, D, E, G]
66: [B, C, D, E, G]
67: [A, F, G]
68: [B, F, G]
69: [A, B, F, G]
70: [C, F, G]
71: [A, C, F, G]
72: [B, C, F, G]
73: [A, B, C, F, G]
74: [D, F, G]
75: [A, D, F, G]
76: [B, D, F, G]
77: [A, B, D, F, G]
78: [C, D, F, G]
79: [A, C, D, F, G]
80: [B, C, D, F, G]
81: [E, F, G]
82: [A, E, F, G]
83: [B, E, F, G]
84: [A, B, E, F, G]
85: [C, E, F, G]
86: [A, C, E, F, G]
87: [B, C, E, F, G]
88: [D, E, F, G]
89: [A, D, E, F, G]
90: [B, D, E, F, G]
91: [C, D, E, F, G]
答案 1 :(得分:1)
请说明元素的数量是否有界。
对于3 - 5,6个元素,索引数组是可行的,可能是short[6]
。
例如,对于最多32个元素,int
可以容纳该集合,其中一个可以:
// dumb:
for (int mask = 0; ; ;) {
int cardinality = Integer.bitCount(mask);
if (3 <= cardinality && cardinality <= 6) {
...
}
}
对于64个以上的元素,BitSet可能会提供紧凑性。
这里的决定受到简单统计的约束:有多少元素等等。 对于int / long / BitSet解决方案,可以从BitSet开始,因为它具有更好的API。例如,通过翻转两位,可以通过相同的位数转到下一个powerset。如果一个人在数学上倾向于,那么De Bruijn周期可能会很有趣。
答案 2 :(得分:0)
对于大小不超过63个元素的集合,如果您的最小大小约束不低,则此实现可能会更快:
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;
import static com.google.common.base.Preconditions.checkArgument;
import static java.lang.Math.min;
public class PowerSet<E> implements Iterator<Iterable<E>>, Iterable<Iterable<E>> {
private int minSize;
private int maxSize;
private E[] arr = null;
private long bits = 0;
private long count = 0;
private long minMask = 0;
/**
* Build a PowerSet of the given {@link Set} to iterate over subsets whose size is between the given boundaries
* @param set the set to create subsets from
* @param minSize the minimal size of the subsets
* @param maxSize the maximum size of the subsets
*/
@SuppressWarnings("unchecked")
public PowerSet(Set<E> set, int minSize, int maxSize) {
checkArgument(maxSize < 63); // limited to 63 because we need one additional bit for hasNext
this.minSize = min(minSize, set.size());
this.maxSize = min(maxSize, set.size());
arr = (E[]) set.toArray();
for (int n = 0; n < minSize; n++) {
bits |= (1L << n);
}
count = countBitSet(bits);
}
@Override
public boolean hasNext() {
return (bits & (1L << arr.length)) == 0;
}
@Override
public Iterable<E> next() {
// compute current subset
final List<E> returnSet = new LinkedList<>();
for (int i = 0; i < arr.length; i++) {
if ((bits & (1L << i)) != 0) {
returnSet.add(arr[i]);
}
}
// compute next bit map for next subset
do {
if (count < minSize) {
long maxFree = lowestIndex(bits) - 1;
long missing = minSize - count;
for (int n = 0; n < min(maxFree, missing); n++) {
bits |= (1L << n);
}
} else {
bits++;
}
count = countBitSet(bits);
} while ((count < minSize) || (count > maxSize));
return returnSet;
}
/**
* Lowest set bit in a long word
* @param i the long word
* @return lowest bit set
*/
private static long lowestIndex(long i) {
int n = 0;
while (n < 64 && (i & 1L) == 0) {
n++;
i = i >>> 1;
}
return n;
}
/**
* Compute the number of bit sets inside a word or a long word.<br/>
* <a href="http://en.wikipedia.org/wiki/Hamming_weight">Hamming weight</a>
* @param i the long word
* @return number of set bits
*/
private static long countBitSet(long i) {
i = i - ((i >>> 1) & 0x5555555555555555L);
i = (i & 0x3333333333333333L) + ((i >>> 2) & 0x3333333333333333L);
i = ((i + (i >>> 4)) & 0x0F0F0F0F0F0F0F0FL);
return (i * (0x0101010101010101L)) >>> 56;
}
@Override
public void remove() {
throw new UnsupportedOperationException("Not Supported!");
}
@Override
public Iterator<Iterable<E>> iterator() {
return this;
}
}
它由一个long而不是BitSet支持,并更快地计算下一个子集... 对于最小约束,它也可能更快。
您可以删除仅用于构造函数checkArgument ...
的Guava依赖项