我正在使用Python 2.7中的nltk模块。以下是我的代码
from nltk.corpus import wordnet as wn
listsyn1 = []
listsyn2 = []
for synset in wn.synsets('dog', pos=wn.NOUN):
print synset.name()
for lemma in synset.lemmas():
listsyn1.append(lemma.name())
for synset in wn.synsets('paw', pos=wn.NOUN):
print synset.name()
for lemma in synset.lemmas():
listsyn2.append(lemma.name())
countsyn1 = len(listsyn1)
countsyn2 = len(listsyn2)
sumofsimilarity = 0;
for firstgroup in listsyn1:
for secondgroup in listsyn2:
print(firstgroup.wup_similarity(secondgroup))
sumofsimilarity = sumofsimilarity + firstgroup.wup_similarity(secondgroup)
averageofsimilarity = sumofsimilarity/(countsyn1*countsyn2)
当我尝试运行此代码时,我收到错误“AttributeError:'unicode'对象没有属性'wup_similarity'”。谢谢你的帮助。
答案 0 :(得分:2)
相似性度量只能由Synset
对象Lemma
和lemma_names
(即str
类型)访问。
dog = wn.synsets('dog', 'n')[0]
paw = wn.synsets('paw', 'n')[0]
print(type(dog), type(paw), dog.wup_similarity(paw))
[OUT]:
<class 'nltk.corpus.reader.wordnet.Synset'> <class 'nltk.corpus.reader.wordnet.Synset'> 0.21052631578947367
当您获得.lemmas()
并访问.names()
对象中的Synset
属性时,您将获得str
:
dog = wn.synsets('dog', 'n')[0]
print(type(dog), dog)
print(type(dog.lemmas()[0]), dog.lemmas()[0])
print(type(dog.lemmas()[0].name()), dog.lemmas()[0].name())
[OUT]:
<class 'nltk.corpus.reader.wordnet.Synset'> Synset('dog.n.01')
<class 'nltk.corpus.reader.wordnet.Lemma'> Lemma('dog.n.01.dog')
<class 'str'> dog
您可以使用hasattr
函数来检查哪些对象/类型可以访问某个功能或属性:
dog = wn.synsets('dog', 'n')[0]
print(hasattr(dog, 'wup_similarity'))
print(hasattr(dog.lemmas()[0], 'wup_similarity'))
print(hasattr(dog.lemmas()[0].name(), 'wup_similarity'))
[OUT]:
True
False
False
最有可能的是,你想要一个与https://github.com/alvations/pywsd/blob/master/pywsd/similarity.py#L76相似的函数,它可以在两个同义词集之间最大化wup_similarity
,但请注意,有许多警告需要预先形式化。
因此,我认为您希望使用.lemma_names()
来避免它。也许,你可以这样做:
def ss_lnames(word):
return set(chain(*[ss.lemma_names() for ss in wn.synsets(word, 'n')]))
dog_lnames = ss_lnames('dog')
paw_lnames = ss_lnames('paw')
for dog_name, paw_name in product(dog_lnames, paw_lnames):
for dog_ss, paw_ss in product(wn.synsets(dog_name, 'n'), wn.synsets(paw_name, 'n')):
print(dog_ss, paw_ss, dog_ss.wup_similarity(paw_ss))
但最有可能的结果是无法解释和不可靠的,因为在外部和内部循环中,synset查找机器人之前没有任何词义消歧。