Question

我有两个列表，例如下面的示例（实际上，a更长），我想删除所有常见元素，在这种情况下是列表punctuation中给出的标点符号。

a = [['A', 'man,', 'view,', 'becomes', 'mankind', ';', 'mankind', 'member', 'comical', 'family', 'Intelligences', '.'],['Jeans', 'lengthen', 'legs', ',', 'hug', 'hips', ',', 'turn', 'heads', '.']]
punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]

Answer 1

如果您需要保留订单，请创建一组要删除的单词并逐项测试包含。

cleaned = [word for word in words if word not in blacklist]

Answer 2

当订单不重要时：

您可以对其执行set()操作，但首先必须展平嵌套列表a（取自https://github.com/dmcg/iterables-v-streams#ea8498ee0627fc59834001a837fa92fba4bcf47ebcf47e）：

b = [item for sublist in a for item in sublist]
cleaned = list(set(b) - set(punctuation))

cleaned是一个类似于['A', 'hug', 'heads', 'family', 'Intelligences', 'becomes', 'Jeans', 'lengthen', 'member', 'turn', 'mankind', 'view,', 'legs', 'man,', 'hips', 'comical']

的列表

当订单很重要时：

简单的列表理解，可能更慢

cleaned = [x for x in b if x not in punctuation]

cleaned看起来像['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

Answer 3

您可以执行此操作，但列表顺序可能会更改。

[list(set(sublist)-set(punctuation)) for sublist in a]

使用集合，您可以删除标点符号条目，并将结果再次转换为列表。使用列表推导为列表中的每个子列表执行此操作。

如果保持订单很重要，您可以这样做：

[[x for x in sublist if not (x in punctuation)] for sublist in a]

Answer 4

你可以这样做：

>>> from itertools import chain
>>> filter(lambda e: e not in punctuation, chain(*a))
['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

或者，如果您想维护子列表结构：

>>> [filter(lambda e: e not in punctuation, sub) for sub in a]
[['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences'], ['Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']]

如何从两个列表中删除常用元素？

4 个答案: