过滤掉列表中不包含特定单词的句子

时间:2016-01-10 23:14:57

标签: list python-2.7

假设我有这个清单:

sentences = ['the cat slept', 'the dog jumped', 'the bird flew']

我想过滤掉包含以下列表中的字词的任何句子:

terms = ['clock', 'dog']

我应该得到:

['the cat slept', 'the bird flew']

我试过这个解决方案,但它不起作用

empty = []
if any(x not in terms for x in sentences):
    empty.append(x)

解决这个问题的最佳方法是什么?

4 个答案:

答案 0 :(得分:0)

为了便于阅读,我会选择这样的解决方案,而不是简化为一个班轮:

for sentence in sentences:
    if all(term not in sentence for term in terms):
        empty.append(sentence)

答案 1 :(得分:0)

使用列表理解的简单蛮力O(m * n)方法:

对于每个句子 - 检查在这句话中是否找到任何不允许的条款,如果没有匹配则允许判刑。

[s for s in sentences if not any(t in s for t in terms)]
# ['the cat slept', 'the bird flew']

显然,您也可以将条件反转为:

[s for s in sentences if all(t not in s for t in terms)]

答案 2 :(得分:0)

与上述两个答案类似但使用过滤器,可能更接近问题规范:

filter(lambda x: all([el not in terms for el in x.split(' ')]), sentences)

答案 3 :(得分:0)

Binary Seach针对太长的句子和术语进行了更优化。

from bisect import bisect
def binary_search(a,x,lo=0,hi=-1):
    i = bisect(a,x,lo,hi)
    if i == 0:
       return -1
    elif a[i-1] == x:
       return i-1
    else:
      return -1



sentences = ['the cat slept', 'the dog jumped', 'the bird flew', 'the a']
terms = ['clock', 'dog']

sentences_with_sorted = [(sentence, sorted(sentence.split()))
                     for sentence in sentences] # sort them for binary search


valid_sentences = []
for sentence in sentences_with_sorted:

       list_of_word = sentence[1] # get sorted word list

       if all([1 if binary_search(list_of_word, word)<0 else 0
        for word in terms]): # find no word found

           valid_sentences.append(sentence[0]) # append them

print valid_sentences