Question

我想从下面给出的列表中找到匹配的项目。我的列表可能超大。

元组“N1_10”中的第一个项目是重复的，并与另一个数组中的另一个项目匹配

ListA {1}中的第一个数组中的元组
ListA ('N1_10', 'N2_28')

中的第二个数组中的元组

('N1_10', 'N3_98')

我想要的输出是

输出 - ＆gt; ListA = [[('N1_10', 'N2_28'), ('N1_35', 'N2_44')], [('N1_22', 'N3_72'), ('N1_10', 'N3_98')], [('N2_33', 'N3_28'), ('N2_55', 'N3_62'), ('N2_61', 'N3_37')]]和其他任何一个匹配 key将进入相同的元组[('N1_10','N2_28','N3_98') , ....

如果你们认为，改变ListA的数据结构是更好的选择，请随时提出建议！谢谢你的帮助！

简化版

列表A = [[（ a，x ），（b，k），（c，l），（ d，m ）]，[（ E，d ），（的 A，p ），（G，S）]，[...] [...] ...]

wantedOutput - ＆gt; [（的 A，X，P ），（B，K），（C，L），（的 d，M，E ），（G，S）.. ...]

Answer 1

更新：重新阅读您的问题后，您似乎正在尝试创建等价类，而不是收集密钥的值。如果

[[(1, 2), (3, 4), (2, 3)]]

应该成为

[(1, 2, 3, 4)]

，那么您将需要将输入解释为图形并应用连接组件算法。您可以将数据结构转换为adjacency list表示形式，并使用广度优先或深度优先搜索来遍历它，或者遍历列表并构建disjoint sets。在任何一种情况下，您的代码都会突然涉及很多与图形相关的复杂性，并且很难根据输入的顺序提供任何输出排序保证。这是一种基于广度优先搜索的算法：

import collections

# build an adjacency list representation of your input
graph = collections.defaultdict(set)
for l in ListA:
    for first, second in l:
        graph[first].add(second)
        graph[second].add(first)

# breadth-first search the graph to produce the output
output = []
marked = set() # a set of all nodes whose connected component is known
for node in graph:
    if node not in marked:
        # this node is not in any previously seen connected component
        # run a breadth-first search to determine its connected component
        frontier = set([node])
        connected_component = []
        while frontier:
            marked |= frontier
            connected_component.extend(frontier)

            # find all unmarked nodes directly connected to frontier nodes
            # they will form the new frontier
            new_frontier = set()
            for node in frontier:
                new_frontier |= graph[node] - marked
            frontier = new_frontier
        output.append(tuple(connected_component))

不要只是在不理解它的情况下复制它;了解它正在做什么，或编写自己的实现。你可能需要能够维持这个。（我会使用伪代码，但Python实际上已经像伪代码一样简单了。）

如果我对您的问题的原始解释是正确的，并且您的输入是您要汇总的键值对的集合，这是我的原始答案：

原始回答

import collections

clusterer = collections.defaultdict(list)

for l in ListA:
    for k, v in l:
        clusterer[k].append(v)

output = clusterer.values()

defaultdict(list)是dict，会自动创建list作为任何尚未存在的密钥的值。循环遍历所有元组，收集匹配相同键的所有值，然后从defaultdict创建（key，value_list）对列表。

（此代码的输出并不完全符合您指定的形式，但我相信此形式更有用。如果您想更改表单，那应该是一件简单的事情。）

Answer 2

tupleList = [(1, 2), (3, 4), (1, 4), (3, 2), (1, 2), (7, 9), (9, 8), (5, 6)]

newSetSet = set ([frozenset (aTuple) for aTuple in tupleList])
setSet = set ()

while newSetSet != setSet:
    print '*'
    setSet = newSetSet
    newSetSet = set ()
    for set0 in setSet:
        merged = False
        for set1 in setSet:
            if set0 & set1 and set0 != set1:
                newSetSet.add (set0 | set1)
                merged = True
        if not merged:
            newSetSet.add (set0)

        print [tuple (element) for element in setSet]
        print [tuple (element) for element in newSetSet]
        print 

print [tuple (element) for element in newSetSet]

# Result:  [(1, 2, 3, 4), (5, 6), (8, 9, 7)]

Answer 3

输出订单是否重要？这是我能想到的最简单的方法：

ListA  = [[('N1_10', 'N2_28'), ('N1_35', 'N2_44')],[('N1_22', 'N3_72'), ('N1_10', 'N3_98')],
            [('N2_33', 'N3_28'), ('N2_55', 'N3_62'), ('N2_61', 'N3_37')]]

idx = dict()

for sublist in ListA:
    for pair in sublist:
        for item in pair:
            mapping = idx.get(item,set())
            mapping.update(pair)
            idx[item] = mapping 
            for subitem in mapping:
                submapping = idx.get(subitem,set())
                submapping.update(mapping)
                idx[subitem] = submapping


for x in set([frozenset(x) for x in idx.values()]):
    print list(x)

输出：

['N3_72', 'N1_22']
['N2_28', 'N3_98', 'N1_10']
['N2_61', 'N3_37']
['N2_33', 'N3_28']
['N2_55', 'N3_62']
['N2_44', 'N1_35']

在元组Python列表列表中查找重复项

3 个答案: