给出一个元组列表的列表,我想找到列表的子集,该列表的子集可以最大化不同整数值的数量而无需重复任何整数。
列表看起来像这样:
x = [
[(1,2,3), (8,9,10), (15,16)],
[(2,3), (10,11)],
[(9,10,11), (17,18,19), (20,21,22)],
[(4,5), (11,12,13), (18,19,20)]
]
内部元组始终是顺序的->(1,2,3)或(15,16),但它们可以是任意长度。
在这种情况下,预期收益为:
maximized_list = [
[(1, 2, 3), (8, 9, 10), (15, 16)],
[(4, 5), (11, 12, 13), (18, 19, 20)]
]
这是有效的,因为在每种情况下:
如果有多个有效的解决方案,则应将所有解决方案归还列表。
我对此有一个幼稚的实现,主要是基于我之前问过的一个stackoverflow问题,该问题的形式不如应有的形式(Python: Find tuples with greatest total distinct values):
import itertools
def maximize(self, x):
max_ = 0
possible_patterns = []
for i in range(1, len(x)+1):
b = itertools.combinations(x, i)
for combo in b:
all_ints = tuple(itertools.chain(*itertools.chain(*combo)))
distinct_ints = tuple(set(all_ints))
if sorted(all_ints) != sorted(distinct_ints):
continue
else:
if len(all_ints) >= max_:
if len(all_ints) == max_:
possible_patterns.append(combo)
new_max = len(all_ints)
elif len(all_ints) > max_:
possible_patterns = [combo]
new_max = len(all_ints)
max_ = new_max
return possible_patterns
上述功能似乎给了我正确的结果,但没有扩展。我将需要接受带有几千个列表(可能多达几万个)的x值,因此需要一种优化的算法。
答案 0 :(得分:2)
以下针对基数解决了子列表的最大子集。它通过展平每个子列表,构造一个子列表之间的交集列表,然后在深度优先搜索中搜索具有最多元素(即最大“权重”)的解的解空间。
def maximize_distinct(sublists):
subsets = [{x for tup in sublist for x in tup} for sublist in sublists]
def intersect(subset):
return {i for i, sset in enumerate(subsets) if subset & sset}
intersections = [intersect(subset) for subset in subsets]
weights = [len(subset) for subset in subsets]
pool = set(range(len(subsets)))
max_set, _ = search_max(pool, intersections, weights)
return [sublists[i] for i in max_set]
def search_max(pool, intersections, weights):
if not pool: return [], 0
max_set = max_weight = None
for num in pool:
next_pool = {x for x in pool - intersections[num] if x > num}
set_ids, weight = search_max(next_pool, intersections, weights)
if not max_set or max_weight < weight + weights[num]:
max_set, max_weight = [num] + set_ids, weight + weights[num]
return max_set, max_weight
可以通过以下方式进一步优化该代码:保留“权重”(子列表的基数之和)的连续总和,并在搜索空间的该分支超出当前最大解的总和时对其进行修剪(这将是最小丢弃重量)。但是,除非遇到性能问题,否则这可能是比其价值更大的工作,并且对于一小部分列表,计算的开销将超过修剪的速度。