优化选择 - 通过从袋子列表中选择有限数量的袋子来最大化设置

时间:2016-07-27 03:53:42

标签: python optimization machine-learning artificial-intelligence set-cover

我有一个袋子清单。
如果允许N选择,我如何选择最大化我的设置的N个包? e.g。

choices = [ [A,A,Z], [B,A,E], [Z,Z,B,W], [Q], ...]

如果N = 2,我想知道......

choices[1, 2]

最大化我的设置......

set = [B, A, E, Z, W]


我试图强制适应这种梯度下降格式,但我在为此创建成本函数时遇到了麻烦。这是正确/合理的方法吗?

解决此问题的最佳方法是什么?


备注:
假设选择列表足够大,无法计算每种可能的选择组合 假设局部最优解是可接受的。

<小时/>
缩小问题......
语言:Python
问题规模:约200万选择; 1-100袋尺寸。

@sascha - 套装提示非常有帮助!

我在编写自己的脚本时采取了捷径,并修改了:https://cs.stackexchange.com/questions/16142/how-to-implement-greedy-set-cover-in-a-way-that-it-runs-in-linear-time

from collections import defaultdict
import random
import string
random.seed(123)  # For reproducibility

size = 100
bag_size = 5
F = []
for i in range(size):
    bag = [string.ascii_uppercase[random.randint(0, 25)] for j in range(bag_size)]
    F.append(set(bag))

print('First 5 sets... of 100')
for item in F[0:5]:
    print(item)

set_freq = {}
for bag in F:
    for item in bag:
        set_freq[item] = set_freq.get(item, 0) + 1
print('Unique items:', len(set_freq))

f_copy = F.copy()  # Because F gets modified

# First prepare a list of all sets where each element appears
D = defaultdict(list)
for y,S in enumerate(F):
    for a in S:
        D[a].append(y)

L=defaultdict(set)        
# Now place sets into an array that tells us which sets have each size
for x,S in enumerate(F):
    L[len(S)].add(x)

E=[] # Keep track of selected sets
# Now loop over each set size
for sz in range(max(len(S) for S in F),0,-1):
    if sz in L:
        P = L[sz] # set of all sets with size = sz
        while len(P):
            x = P.pop()
            E.append(x)
            for a in F[x]:
                for y in D[a]:
                    if y!=x:
                        S2 = F[y]
                        L[len(S2)].remove(y)
                        S2.remove(a)
                        L[len(S2)].add(y)
print('Results...\n')
print('Indices:', E)
captured = {}
for index in E:
    for item in f_copy[index]:
        captured[item] = captured.get(item, 0) + 1
print('Unique items captured:', len(captured))


打印...

First 5 sets... of 100
{'B', 'Y', 'N', 'I', 'C'}
{'D', 'M', 'B', 'R', 'I'}
{'K', 'F', 'B', 'R'}
{'K', 'E', 'W', 'R'}
{'H', 'A', 'F', 'N', 'Y'}
Unique items: 26
Results...

Indices: [0, 10, 48, 32, 69, 7, 2, 5]
Unique items captured: 26


我缺少的部分是......例如如果我只能选择3,我该如何最大化覆盖范围?

0 个答案:

没有答案