我想从下面给出的列表中找到匹配的项目。我的列表可能超大。
元组“N1_10”中的第一个项目是重复的,并与另一个数组中的另一个项目匹配
ListA {1}中的第一个数组中的元组
ListA ('N1_10', 'N2_28')
('N1_10', 'N3_98')
我想要的输出是
输出 - > ListA = [[('N1_10', 'N2_28'), ('N1_35', 'N2_44')],
[('N1_22', 'N3_72'), ('N1_10', 'N3_98')],
[('N2_33', 'N3_28'), ('N2_55', 'N3_62'), ('N2_61', 'N3_37')]]
和其他任何一个匹配
key将进入相同的元组[('N1_10','N2_28','N3_98') , ....
如果你们认为,改变ListA的数据结构是更好的选择,请随时提出建议! 谢谢你的帮助!
简化版
列表A = [[( a,x ),(b,k),(c,l),( d,m )],[( E,d ),(的 A,p ),(G,S)],[...] [...] ...]
wantedOutput - > [(的 A,X,P ),(B,K),(C,L),(的 d,M,E ),(G,S).. ...]
答案 0 :(得分:3)
更新:重新阅读您的问题后,您似乎正在尝试创建等价类,而不是收集密钥的值。如果
[[(1, 2), (3, 4), (2, 3)]]
应该成为
[(1, 2, 3, 4)]
,那么您将需要将输入解释为图形并应用连接组件算法。您可以将数据结构转换为adjacency list表示形式,并使用广度优先或深度优先搜索来遍历它,或者遍历列表并构建disjoint sets。在任何一种情况下,您的代码都会突然涉及很多与图形相关的复杂性,并且很难根据输入的顺序提供任何输出排序保证。这是一种基于广度优先搜索的算法:
import collections
# build an adjacency list representation of your input
graph = collections.defaultdict(set)
for l in ListA:
for first, second in l:
graph[first].add(second)
graph[second].add(first)
# breadth-first search the graph to produce the output
output = []
marked = set() # a set of all nodes whose connected component is known
for node in graph:
if node not in marked:
# this node is not in any previously seen connected component
# run a breadth-first search to determine its connected component
frontier = set([node])
connected_component = []
while frontier:
marked |= frontier
connected_component.extend(frontier)
# find all unmarked nodes directly connected to frontier nodes
# they will form the new frontier
new_frontier = set()
for node in frontier:
new_frontier |= graph[node] - marked
frontier = new_frontier
output.append(tuple(connected_component))
不要只是在不理解它的情况下复制它;了解它正在做什么,或编写自己的实现。你可能需要能够维持这个。 (我会使用伪代码,但Python实际上已经像伪代码一样简单了。)
如果我对您的问题的原始解释是正确的,并且您的输入是您要汇总的键值对的集合,这是我的原始答案:
原始回答
import collections
clusterer = collections.defaultdict(list)
for l in ListA:
for k, v in l:
clusterer[k].append(v)
output = clusterer.values()
defaultdict(list)
是dict
,会自动创建list
作为任何尚未存在的密钥的值。循环遍历所有元组,收集匹配相同键的所有值,然后从defaultdict创建(key,value_list)对列表。
(此代码的输出并不完全符合您指定的形式,但我相信此形式更有用。如果您想更改表单,那应该是一件简单的事情。)
答案 1 :(得分:2)
tupleList = [(1, 2), (3, 4), (1, 4), (3, 2), (1, 2), (7, 9), (9, 8), (5, 6)]
newSetSet = set ([frozenset (aTuple) for aTuple in tupleList])
setSet = set ()
while newSetSet != setSet:
print '*'
setSet = newSetSet
newSetSet = set ()
for set0 in setSet:
merged = False
for set1 in setSet:
if set0 & set1 and set0 != set1:
newSetSet.add (set0 | set1)
merged = True
if not merged:
newSetSet.add (set0)
print [tuple (element) for element in setSet]
print [tuple (element) for element in newSetSet]
print
print [tuple (element) for element in newSetSet]
# Result: [(1, 2, 3, 4), (5, 6), (8, 9, 7)]
答案 2 :(得分:2)
输出订单是否重要?这是我能想到的最简单的方法:
ListA = [[('N1_10', 'N2_28'), ('N1_35', 'N2_44')],[('N1_22', 'N3_72'), ('N1_10', 'N3_98')],
[('N2_33', 'N3_28'), ('N2_55', 'N3_62'), ('N2_61', 'N3_37')]]
idx = dict()
for sublist in ListA:
for pair in sublist:
for item in pair:
mapping = idx.get(item,set())
mapping.update(pair)
idx[item] = mapping
for subitem in mapping:
submapping = idx.get(subitem,set())
submapping.update(mapping)
idx[subitem] = submapping
for x in set([frozenset(x) for x in idx.values()]):
print list(x)
输出:
['N3_72', 'N1_22']
['N2_28', 'N3_98', 'N1_10']
['N2_61', 'N3_37']
['N2_33', 'N3_28']
['N2_55', 'N3_62']
['N2_44', 'N1_35']