Question

我有一组独特的元组，如下所示。第一个值是名称，第二个值是ID，第三个值是类型。

（＆＃39; 9＆＃39;，＆＃39; 0000022＆＃39;，＆＃39; LRA＆＃39;）
  （＆＃39; 45＆＃39;，＆＃39; 0000016＆＃39;，＆＃39; PBM＆＃39;）
  （＆＃39; 16＆＃39;，＆＃39; 0000048＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 304＆＃39;，＆＃39; 0000042＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 7＆＃39;，＆＃39; 0000014＆＃39;，＆＃39; IBL＆＃39;）
  （＆＃39; 12＆＃39;，＆＃39; 0000051＆＃39;，＆＃39; LRA＆＃39;）
  （＆＃39; 7＆＃39;，＆＃39; 0000014＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 68＆＃39;，＆＃39; 0000002＆＃39;，＆＃39; PBM＆＃39;）
  （＆＃39; 356＆＃39;，＆＃39; 0000049＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 12＆＃39;，＆＃39; 0000051＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 15＆＃39;，＆＃39; 0000015＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 32＆＃39;，＆＃39; 0000046＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 9＆＃39;，＆＃39; 0000022＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 10＆＃39;，＆＃39; 0000007＆＃39;，＆＃39; PBM＆＃39;）
  （＆＃39; 7＆＃39;，＆＃39; 0000014＆＃39;，＆＃39; LRA＆＃39;）
  （＆＃39; 439＆＃39;，＆＃39; 0000005＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 4＆＃39;，＆＃39; 0000029＆＃39;，＆＃39; LRA＆＃39;）
  （＆＃39; 41＆＃39;，＆＃39; 0000064＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 10＆＃39;，＆＃39; 0000007＆＃39;，＆＃39; IBL＆＃39;）
  （＆＃39; 8＆＃39;，＆＃39; 0000006＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 331＆＃39;，＆＃39; 0000040＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 9＆＃39;，＆＃39; 0000022＆＃39;，＆＃39; IBL＆＃39;）

此集包含名称/ ID组合的副本，但它们各自具有不同的类型。例如：

（＆＃39; 9＆＃39;，＆＃39; 0000022＆＃39;，＆＃39; LRA＆＃39;）
  （＆＃39; 9＆＃39;，＆＃39; 0000022＆＃39;，＆＃39; PBL＆＃39;）
  （＆＃39; 9＆＃39;，＆＃39; 0000022＆＃39;，＆＃39; IBL＆＃39;）

我想要做的是处理这组元组，以便我可以创建一个新列表，其中每个名称/ ID组合只出现一次，但包括所有类型。此列表应仅包含具有多种类型的名称/ ID组合。例如，我的输出看起来像这样：

（＆＃39; 9＆＃39;，＆＃39; 0000022＆＃39;，＆＃39; LRA＆＃39;，＆＃39; PBL＆＃39;，＆＃39; IBL＆＃39;）<登记/> （＆＃39; 7＆＃39;，＆＃39; 0000014＆＃39;，＆＃39; IBL＆＃39;，＆＃39; PBL＆＃39;，＆＃39; LRA＆＃39;）

但我的输出不应包含只有一种类型的名称/ ID组合：

（＆＃39; 45＆＃39;，＆＃39; 0000016＆＃39;，＆＃39; PBM＆＃39;）
（＆＃39; 16＆＃39;，＆＃39; 0000048＆＃39;，＆＃39; PBL＆＃39;）

感谢任何帮助！

Answer 1

itertools.groupby对其输出的内容进行一些额外处理将完成工作：

from itertools import groupby

data = {
    ('9', '0000022', 'LRA'),
    ('45', '0000016', 'PBM'),
    ('16', '0000048', 'PBL'),
    ...
}

def group_by_name_and_id(s):
    grouped = groupby(sorted(s), key=lambda (name, id_, type_): (name_, id))
    for (name, id_), items in grouped:
        types = tuple(type_ for _, _, type_ in items)
        if len(types) > 1:
            yield (name, id_) + types

print '\n'.join(str(x) for x in group_by_name_and_id(data))

输出：

('10', '0000007', 'PBM', 'IBL')
('12', '0000051', 'LRA', 'PBL')
('7', '0000014', 'LRA', 'PBL', 'IBL')
('9', '0000022', 'LRA', 'PBL', 'IBL')

PS 但我真的不喜欢那种设计：thet类型可能/应该是元组第3项中包含的列表，而不是元组本身的一部分...因为这这样一来，元组的长度是动态的，那就是丑陋......元组并不是那样用的。所以最好替换

        types = tuple(type_ for _, _, type_ in items)
        yield (name, id_) + types

与

        types = [type_ for _, _, type_ in items]
        yield (name, id_, types)

让人看起来更干净

('10', '0000007', ['IBL', 'PBM'])
('12', '0000051', ['LRA', 'PBL'])
('7', '0000014', ['IBL', 'LRA', 'PBL'])
('9', '0000022', ['IBL', 'LRA', 'PBL'])

例如，

然后您可以使用for name, id, types in transformed_data:迭代结果数据。

Answer 2

使用defaultdict累积然后过滤：

非常简单

from collections import defaultdict

d = defaultdict(list)
for tup in list_of_tuples:
    d[(tup[0],tup[1])].append(tup[2])

d
Out[15]: defaultdict(<class 'list'>, {('16', '0000048'): ['PBL'], ('9', '0000022'): ['LRA', 'PBL', 'IBL'], ('12', '0000051'): ['LRA', 'PBL'], ('304', '0000042'): ['PBL'], ('331', '0000040'): ['PBL'], ('41', '0000064'): ['PBL'], ('356', '0000049'): ['PBL'], ('15', '0000015'): ['PBL'], ('8', '0000006'): ['PBL'], ('4', '0000029'): ['LRA'], ('7', '0000014'): ['IBL', 'PBL', 'LRA'], ('32', '0000046'): ['PBL'], ('68', '0000002'): ['PBM'], ('439', '0000005'): ['PBL'], ('10', '0000007'): ['PBM', 'IBL'], ('45', '0000016'): ['PBM']})

然后过滤：

[(key,val) for key,val in d.items() if len(val) > 1]
Out[29]: 
[(('9', '0000022'), ['LRA', 'PBL', 'IBL']),
 (('12', '0000051'), ['LRA', 'PBL']),
 (('7', '0000014'), ['IBL', 'PBL', 'LRA']),
 (('10', '0000007'), ['PBM', 'IBL'])]

如果你真的想让它恢复原来的格式：

from itertools import chain

[tuple(chain.from_iterable(tup)) for tup in d.items() if len(tup[1]) > 1]
Out[27]: 
[('9', '0000022', 'LRA', 'PBL', 'IBL'),
 ('12', '0000051', 'LRA', 'PBL'),
 ('7', '0000014', 'IBL', 'PBL', 'LRA'),
 ('10', '0000007', 'PBM', 'IBL')]

虽然我认为最好将它作为dict与（name，id）元组作为键保存，就像我们在第一步中生成的一样。

Answer 3

科学的单行（其他答案更具可读性，可能更正确）：

testlist=[('9', '0000022', 'LRA'),
('45', '0000016', 'PBM'),
('16', '0000048', 'PBL'),
('304', '0000042', 'PBL'),etc.]


from collections import Counter

new_list = [(a1,b1)+tuple([c for (a,b,c) in testlist if (a,b) == (a1,b1)]) \
      for (a1,b1) in [pair for pair,count in Counter([(a,b) \
      for (a,b,c) in testlist]).iteritems() if count > 1]]

print new_list

产量：

[('9', '0000022', 'LRA', 'PBL', 'IBL'),
 ('12', '0000051', 'LRA', 'PBL'), 
 ('10', '0000007', 'PBM', 'IBL'), 
 ('7', '0000014', 'IBL', 'PBL', 'LRA')]

处理一组独特的元组

3 个答案: