如何有效地检查字符串是否包含两个列表中的至少一个元素

时间:2019-01-22 03:31:38

标签: python

我有两个列表和一个句子列表,如下所示。

list1 = ['data mining', 'data sources', 'data']
list2 = ['neural networks', 'deep learning', 'machine learning']

sentences = ["mining data using neural networks has become a trend", "data mining is easy with python", "machine learning is my favorite", "data mining and machine learning are awesome", "data sources and data can been used for deep learning purposes", "data, deep learning and neural networks"]

我想选择包含list1list2元素的句子。即输出应为

["mining data using neural networks has become a trend", "data mining and machine learning are awesome", "data sources and data can been used for deep learning purposes", "data, deep learning and neural networks"]

我当前的代码如下。

for sentence in sentences:
    for terms in list1:
        for words in list2:
           if terms in sentence:
               if words in sentence:
                     print(sentence)

但是,代码是O(n ^ 3),效率不是很高。在python中有什么有效的方法吗?

很高兴在需要时提供更多详细信息。

3 个答案:

答案 0 :(得分:4)

与列表相比,集的浏览效率更高。您可以使用两个“列表”来检查每个句子的交集(&),而不是使用嵌套循环 if 来查找包含的句子这两个词都来自列表:

list1 = set(list1)
list2 = set(list2)
[sentence for sentence in set(sentences.split()) if sentence & list1 & list2]

但是,由于您的列表似乎包含短语(或单词序列),因此很难避免使用多个循环。如果找到或找不到匹配项,则至少可以中断或继续循环。您也不需要将要匹配的两个列表的循环相互嵌套。

result = []
for sentence in sentences:
    for word in list1:
        if word in sentence:
            break
    else:
        continue
    for word in list2:
        if word in sentence:
            break
    else:
        continue
    result.append(sentence)

结果:

['mining data using neural networks has become a trend',
 'data mining and machine learning are awesome',
 'data sources and data can been used for deep learning purposes',
 'data, deep learning and neural networks']

答案 1 :(得分:4)

您可以利用allany的短路来提高性能:

list1 = ['data mining', 'data sources', 'data']
list2 = ['neural networks', 'deep learning', 'machine learning']
sentences = ["mining data using neural networks has become a trend", "data mining is easy with python", "machine learning is my favorite", "data mining and machine learning are awesome", "data sources and data can been used for deep learning purposes", "data, deep learning and neural networks"]

for sentence in sentences:
    if all(any(term in sentence for term in lst) for lst in (list1, list2)):
        print(sentence)

答案 2 :(得分:2)

尝试减少这样的循环:

list1 = ['data mining', 'data sources', 'data']
list2 = ['neural networks', 'deep learning', 'machine learning']

sentences = ["mining data using neural networks has become a trend", "data mining is easy with python", "machine learning is my favorite", "data mining and machine learning are awesome", "data sources and data can been used for deep learning purposes", "data, deep learning and neural networks"]

matches_list_1 = set()
matches_list_2 = set()

for index, sentence in enumerate(sentences):
    for terms in list1:
        if terms in sentence:
            matches_list_1.add(index)
    for terms in list2:
        if terms in sentence:
            matches_list_2.add(index)

for index in (matches_list_1 & matches_list_2):
    print(sentences[index])