我有两个清单:
all_words_merged = ['ego', 'femina', 'incenderare', 'tuus', 'casa', 'et',
'cutullus', 'incipere', 'et', 'wingardium', 'leviosa']
class_words_merged = ['femina', 'incenderare', 'incipere', 'wingardium']
我想取all_words_merged
并删除class_words_merged
中出现的所有实例。结果列表应为:
result = ['ego', 'tuus', 'casa', 'et', 'cutullus', 'et', 'leviosa']
我尝试了下面的代码,但它返回了一个空列表:
result = [x for x in class_words_merged if x[0] in all_words_merged]
答案 0 :(得分:4)
如果class_words_merged
很大,它会加速将其转换为第一组:
>>> to_remove = set(class_words_merged)
>>> [word for word in all_words_merged if word not in to_remove]
['ego', 'tuus', 'casa', 'et', 'cutullus', 'et', 'leviosa']
大100倍:
large_class_words_merged = class_words_merged * 100
首先创建:
%%timeit
to_remove = set(large_class_words_merged)
[word for word in all_words_merged if word not in to_remove]
1000 loops, best of 3: 493 µs per loop
重复浏览列表:
%timeit [word for word in all_words_merged if word not in large_class_words_merged]
100 loops, best of 3: 3.18 ms per loop
提示:
%timeit
和%%imeit
是我在Jupyter笔记本中使用的IPython魔术命令。
答案 1 :(得分:3)
您应该遍历all_words_merged
并且只包含不在class_words_merged
result = [x for x in all_words_merged if x not in class_words_merged]
输出:
['ego', 'tuus', 'casa', 'et', 'cutullus', 'et', 'leviosa']
修改强>
如果class_words_merged
可以包含重复项,那么首先使用set
可以提供更好的效果。
cwm_set = set(class_words_merged)
result = [x for x in all_words_merged if x not in cwm_set]
答案 2 :(得分:0)
您也可以使用filter
内置方法执行此操作,如下所示:
>>> all_words_merged = ['ego', 'femina', 'incenderare', 'tuus', 'casa', 'et', 'cutullus', 'incipere', 'et', 'wingardium', 'leviosa']
>>> class_words_merged = ['femina', 'incenderare', 'incipere', 'wingardium']
>>>
>>> list(filter(lambda x: x not in class_words_merged, all_words_merged))
['ego', 'tuus', 'casa', 'et', 'cutullus', 'et', 'leviosa']
Python3需要 list
,filter
生成过滤器对象,而在Python2中,这不是必需的,只需:
>>> filter(lambda x: x not in class_words_merged, all_words_merged)
编辑:
这当然不是优化方式,因为您必须将生成器转换为列表,您可以通过时序配置文件猜测它:
>>> timeit.timeit(stmt='list(filter(lambda x: x not in c, a))', globals={'a':all_words_merged, 'c':class_words_merged})
2.6026250364160717
>>> timeit.timeit(stmt='[x for x in a if x not in c]', globals={'a':all_words_merged, 'c':class_words_merged})
1.3826178676799827