Question

我有两个列表，我想从语句中找到关键字，如果语句中有该特定关键字，则必须返回该关键字。我正在o(n^2)中进行此操作。我可以用o(n)还是其他一些较小的复杂性来做到这一点？

keywords = ['name', 'class', 'school', 'address']

statements = ['name is hello', 'name is not hello', 'school is hello', 'address is hello']

for key in keywords :
    for statement in statements :
            string = statement
            if string.find(key) != -1:
            print(key)

如果需要，我们可以增加空间复杂度，但是我需要降低时间复杂度。我只需要一个逻辑就可以实现这一目标。

Answer 1

将关键字列表设为一组。这样，如果您要检查单词是否为关键字，则为O（1）查找。（如果您关心空间的复杂性，请改用radix tree）

words = {'name', 'class', ...}

然后像这样遍历语句中的每个单词：

for statement in statements:
    for word in statement.split():
        if word in words:
            print(word)

O(n * m)，其中m是最长字符串的长度。我不确定str.split()的效率或它的工作原理如何，但是您可以通过statement手动查找每个单词并检查空格，而不是创建列表来降低空间复杂度在内存中。

Answer 2

如果只想查找 any 语句中是否存在关键字中的 any 键，请首先尝试使用str.join：

joined_statements = ' '.join(statements)
for key in keywords:
    if key in joined_statements:
        print(key)

输出：

name
school
address

Answer 3

代替

如果string.find（key）！= -1：

你可以做

如果键入字符串：

但是无论如何，缩进是错误的，而且返回也不应该起作用。

相反，您可以执行以下操作：

keywords = ['name', 'class', 'school', 'address']
checkedkeywords = []

statements = ['name is hello', 'name is not hello', 'school is hello', 'address is hello']

for key in keywords :
    for statement in statements :
            string = statement
            if key in string:
              checkedkeywords.append(key)

print(checkedkeywords)

希望有帮助，祝你好运！

Answer 4

因此，您需要使用REVERSE INDEXING方法来解决此问题。

创建一个空字典，lookup_dict={}

现在循环遍历每个语句中的每个单词，并存储与该单词对应的STATEMENTS_INDEX，如下所述。

statements = ['name is hello', 'name is not hello', 'school is hello', 'address is hello']

lookup_dict= {
          'name': [0,1], # Denoting 'name' keyword comes in index 0 and 1
          'is': [0,1,2,3],
          'hello':[0,1,2,3],
          'not':[1],
          'address':[3]
 }

现在，一旦您创建了索引，并且如果有大量数据的话，这通常是一次性操作。

现在，如果您需要检查所有语句中的哪个关键字，只需使用查找字典即可。

现在假设您需要检查关键字 name 的所有语句，只是在字典中查找即可获得所有索引。

这种逻辑称为反向索引，由lucene使用，而solr在内部使用lucene。

Answer 5

您需要这个https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm 在另一个不是免费的字符串中查找字符串。更简单的方法

keywords = ['name', 'class', 'school', 'address']

statements = ['name is hello', 'name is not hello', 'school is hello', 'address is hello']
from collection import defaultdict
word2statements = defaultdict(list)
for statement in statements :
    for word in set(statement.split()):
        word2statements[word].append(statement)

for keyword in keywords:
    word2statements[keyword]

比较复杂度为o（n）的Python中的两个列表

5 个答案: