Question

我有一个元组列表：

[('fruit', 'O'), ('is', 'O'), ('the', 'O'), ('subject', 'O'), ('of', 'O'), ('a', 'O'), ('Roald', 'PERSON'), ('Dahl', 'PERSON'), ('children', 'O'), ("'s", 'O'), ('book', 'O'), ('?', 'O')]

我想将此列表缩减为：

[('fruit', 'O'), ('is', 'O'), ('the', 'O'), ('subject', 'O'), ('of', 'O'), ('a', 'O'), ('Roald Dahl', PERSON'), ('children', 'O'), ("'s", 'O'), ('book', 'O'), ('?', 'O')]

也就是说，任何第二个值不是'O'的连续元组都应该连接它们的第一个值。这适用于任何长度的列表，以及任何数量的连续元组。

Answer 1

您可以使用itertools.groupby使用每个元组中的最后一个元素进行分组：

import itertools
s = [(1, 2), (3, 4), (5, 4), (10, 4), (7, 8)]
s = [(a, list(b)) for a, b in itertools.groupby(s, key=lambda x:x[-1])]
final_s = [(sum(i[0] for i in b), a) for a, b in s]

输出：

[(1, 2), (18, 4), (7, 8)]

编辑：关于新的非数字元组列表，您可以尝试：

from functools import reduce
def remove(data, to_avoid='O'):
   s = [(a, list(b)) for a, b in itertools.groupby(data, key=lambda x:x[-1])]
   final_s = [x for i in [b if a == to_avoid else [(reduce(lambda c, d: "{} {}".format(c, d), [h[0] for h in b]), a)] for a, b in s] for x in i]
   return final_s


>>remove([('fruit', 'O'), ('is', 'O'), ('the', 'O'), ('subject', 'O'), ('of', 'O'), ('a', 'O'), ('Roald', 'PERSON'), ('Dahl', 'PERSON'), ('children', 'O'), ("'s", 'O'), ('book', 'O'), ('?', 'O')])

输出：

[('fruit', 'O'), ('is', 'O'), ('the', 'O'), ('subject', 'O'), ('of', 'O'), ('a', 'O'), ('Roald Dahl', 'PERSON'), ('children', 'O'), ("'s", 'O'), ('book', 'O'), ('?', 'O')]

对于我们这些不太了解的人，并使用operator.itemgetter而不是lamda＆＃39>

import itertools, operator
item0 = operator.itemgetter(0)
item1 = operator.itemgetter(1)
result = []
for k, g in itertools.groupby(s, key=item1):
    if k != 'O':
        result.append((' '.join(map(item0, g)),k))
    else:
        result.extend(g)

合并列表的元素

1 个答案: