我有一个列表集合,其中一些具有重叠的元素:
coll = [['aaaa', 'aaab', 'abaa'],
['bbbb', 'bbbb'],
['aaaa', 'bbbb'],
['dddd', 'dddd'],
['bbbb', 'bbbb', 'cccc','aaaa'],
['eeee','eeef','gggg','gggi'],
['gggg','hhhh','iiii']]
我只想将重叠的列表集中在一起,这样会产生
pooled = [['aaaa', 'aaab', 'abaa','bbbb','cccc'],
['eeee','eeef','gggg','gggi','hhhh','iiii'],
['dddd', 'dddd']]
(如果不清楚,第一个和第二个列表都与第三个列表重叠,因此即使它们本身不包含相同的元素,也应将它们全部合并在一起。)
“重叠”是指两个列表至少具有一个共同的元素。 “合并”是指将两个列表合并为一个平面列表或单个平面集。
可能有几套,例如x,y和z彼此重叠,v和w彼此重叠,但是x + y + z不与v + w重叠。可能有些列表没有任何重叠。
(比喻是家庭。将所有Montague家族联合起来,将所有Capulets家族联合在一起,但是没有Montague曾经嫁给Capulets家族,因此这两个群体将保持不同。)
我不在乎重复项是否被多次包含。
在Python中执行此操作的简单且合理快速的方法是什么?
编辑:这似乎不是Yet another merging list of lists, but most pythonic way的重复,因为这似乎没有考虑仅通过第三组重叠的组。我从该问题尝试过的解决方案无法在这里找到想要的答案。
答案 0 :(得分:1)
这是一种实现方法(假设您想要重叠结果中的唯一元素):
def over(coll):
print('Input is:\n', coll)
# gather the lists that do overlap
overlapping = [x for x in coll if any(x_element in [y for k in coll if k != x for y in k] for x_element in x)]
# flatten and get unique
overlapping = sorted(list(set([z for x in overlapping for z in x])))
# get the rest
non_overlapping = [x for x in coll if all(y not in overlapping for y in x)]
# use the line bellow only if merged non-overlapping elements are desired
# non_overlapping = sorted([y for x in non_overlapping for y in x])
print('Output is"\n',[overlapping, non_overlapping])
coll = [['aaaa', 'aaab', 'abaa'],
['bbbb', 'bbbb'],
['aaaa', 'bbbb'],
['dddd', 'dddd'],
['bbbb', 'bbbb', 'cccc','aaaa']]
over(coll)
coll = [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
over(coll)
输出:
$ python3 over.py -- NORMAL --
Input is:
[['aaaa', 'aaab', 'abaa'], ['bbbb', 'bbbb'], ['aaaa', 'bbbb'], ['dddd', 'dddd'], ['bbbb', 'bbbb', 'cccc', 'aaaa']]
Output is"
[['aaaa', 'aaab', 'abaa', 'bbbb', 'cccc'], [['dddd', 'dddd']]]
Input is:
[['aaaa', 'aaaa'], ['bbbb', 'bbbb']]
Output is"
[[], [['aaaa', 'aaaa'], ['bbbb', 'bbbb']]]
答案 1 :(得分:1)
您可以使用连续合并方法对集合进行此操作:
coll = [['aaaa', 'aaab', 'abaa'],
['bbbb', 'bbbb'],
['aaaa', 'bbbb'],
['dddd', 'dddd'],
['bbbb', 'bbbb', 'cccc','aaaa'],
['eeee','eeef','gggg','gggi'],
['gggg','hhhh','iiii']]
pooled = [set(subList) for subList in coll]
merging = True
while merging:
merging=False
for i,group in enumerate(pooled):
merged = next((g for g in pooled[i+1:] if g.intersection(group)),None)
if not merged: continue
group.update(merged)
pooled.remove(merged)
merging = True
print(pooled)
# [{'aaaa', 'abaa', 'aaab', 'cccc', 'bbbb'}, {'dddd'}, {'gggg', 'eeef', 'eeee', 'hhhh', 'gggi', 'iiii'}]
答案 2 :(得分:0)
根据评论中alkasm的建议,我使用了networkx:
import networkx as nx
coll = [['aaaa', 'aaab', 'abaa'],
['bbbb', 'bbbb'],
['aaaa', 'bbbb'],
['dddd', 'dddd'],
['bbbb', 'bbbb', 'cccc','aaaa'],
['eeee','eeef','gggg','gggi'],
['gggg','hhhh','iiii']]
edges = []
for i in range(len(coll)):
a = coll[i]
for j in range(len(coll)):
if i != j:
b = coll[j]
if set(a).intersection(set(b)):
edges.append((i,j))
G = nx.Graph()
G.add_nodes_from(range(len(coll)))
G.add_edges_from(edges)
for c in nx.connected_components(G):
combined_lists = [coll[i] for i in c]
flat_list = [item for sublist in combined_lists for item in sublist]
print(set(flat_list))
输出:
{'cccc', 'bbbb', 'aaab', 'aaaa', 'abaa'}
{'dddd'}
{'eeef', 'eeee', 'hhhh', 'gggg', 'gggi', 'iiii'}
毫无疑问,这可以优化,但是现在看来已经解决了我的问题。