比较两个列表的最有效方法是什么,只保留列表A中的元素而不是非常大的数据集中的B?
示例:
words = ['shoe brand', 'car brand', 'smoothies for everyone', ...]
filters = ['brand', ...]
# Matching function
results = ['smoothies for everyone']
已经somewhat similar questions但我正在处理1M +单词和过滤器,导致正则表达式重载。我曾经用while-loops做一个简单的'filters [i] in words [j]'test,但这看起来非常低效。
答案 0 :(得分:2)
您可以设置过滤器
>>> words = ['shoe brand', 'car brand', 'smoothies for everyone']
>>> filters = {'brand'}
>>> [w for w in words if all(i not in filters for i in w.split())]
['smoothies for everyone']
这比你的filters[i] in words[j]
更好,因为如果过滤列表中有“平滑”,它就不会过滤“冰沙”
答案 1 :(得分:2)
我尝试了稍微修改过的@gnibbler版本:它使用set operation intersection 而不是list comprehension。我相信这个版本要快一点。
>>> words = ['shoe brand', 'car brand', 'smoothies for everyone']
>>> filters = {'brand'}
>>> [w for w in words if not set(w.split()).intersection(filters)]
['smoothies for everyone']