Question

我有以下句子和词典：

sentence = "I love Obama and David Card, two great people. I live in a boat"

dico = {
'dict1':['is','the','boat','tree'],
'dict2':['apple','blue','red'],
'dict3':['why','Obama','Card','two'],
}

我希望匹配句子和给定字典中元素的数量。较重的方法包括执行以下步骤：

classe_sentence = []
text_splited = sentence.split(" ")
dic_keys = dico.keys()
for key_dics in dic_keys:
    for values in dico[key_dics]:
        if values in text_splited:
            classe_sentence.append(key_dics)

from collections import Counter
Counter(classe_sentence)

其中给出了以下输出：

Counter({'dict1': 1, 'dict3': 2})

然而它根本没有效率，因为有两个循环，它是原始的比较。我想知道是否有更快的方法来做到这一点。也许使用itertools对象。有什么想法吗？

提前致谢！

Answer 1

您可以使用set数据数据类型进行所有比较，使用set.intersection方法获取匹配数。

它会提高算法效率，但它只计算一次每个单词，即使它出现在句子的几个地方。

sentence = set("I love Obama and David Card, two great people. I live in a boat".split())

dico = {
'dict1':{'is','the','boat','tree'},
'dict2':{'apple','blue','red'},
'dict3':{'why','Obama','Card','two'}
}


results = {}
for key, words in dico.items():
    results[key] = len(words.intersection(sentence))

Answer 2

假设您需要区分大小写的匹配：

const

在Python

2 个答案: