我有两个列表,我想检查两个列表中每个单词之间的相似性,找出最大相似度。这是我的代码,
from nltk.corpus import wordnet
list1 = ['Compare', 'require']
list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write']
list = []
for word1 in list1:
for word2 in list2:
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
s = wordFromList1.wup_similarity(wordFromList2)
list.append(s)
print(max(list))
但这会导致错误:
wordFromList2 = wordnet.synsets(word2)[0]
IndexError: list index out of range
请帮我解决这个问题 谢谢你
答案 0 :(得分:10)
如果synset列表为空,则会出现错误,并且您尝试将元素设置为(不存在)索引为零。但为什么只检查第零个元素?如果要检查所有内容,请尝试返回的同义词集中的所有元素对。您可以使用itertools.product()
为自己保存两个for循环:
from itertools import product
sims = []
for word1, word2 in product(list1, list2):
syns1 = wordnet.synsets(word1)
syns2 = wordnet.synsets(word2)
for sense1, sense2 in product(syns1, syns2):
d = wordnet.wup_similarity(sense1, sense2)
sims.append((d, syns1, syns2))
这是低效的,因为一次又一次地查找相同的同义词,但它最接近代码的逻辑。如果您有足够的数据来提高速度问题,您可以通过收集list1
和list2
一次中所有字词的同义词集来加快速度,并获取同义词集。
>>> allsyns1 = set(ss for word in list1 for ss in wordnet.synsets(word))
>>> allsyns2 = set(ss for word in list2 for ss in wordnet.synsets(word))
>>> best = max((wordnet.wup_similarity(s1, s2) or 0, s1, s2) for s1, s2 in
product(allsyns1, allsyns2))
>>> print(best)
(0.9411764705882353, Synset('command.v.02'), Synset('order.v.01'))
答案 1 :(得分:8)
在使用之前,请尝试检查这些列表是否为空:
from nltk.corpus import wordnet
list1 = ['Compare', 'require']
list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write']
list = []
for word1 in list1:
for word2 in list2:
wordFromList1 = wordnet.synsets(word1)
wordFromList2 = wordnet.synsets(word2)
if wordFromList1 and wordFromList2: #Thanks to @alexis' note
s = wordFromList1[0].wup_similarity(wordFromList2[0])
list.append(s)
print(max(list))