使用Python 3.5,NLTK和WordNet(最新版本),我正在计算单词的所有synset对之间的wup_similarity();#34; continuous"和"正在进行"。每个距离都是"无"尽管这两个词的简单英语含义看似相似。我还注意到这些词的词性是"形容词"和"形容词卫星"分别。
由于词性不同,计算是否失败? 有没有办法以某种方式将这两个部分组合起来以规避问题?
先谢谢你。我的代码片段如下所示。
import nltk
from nltk.corpus import wordnet as wn
sl1 = wn.synsets("continuous")
sl2 = wn.synsets("ongoing")
for x in sl1:
for y in sl2:
print(x, y, x.wup_similarity(y))
答案 0 :(得分:0)
不应该出现任何“失败”,就像提出的错误一样。
然而,它会返回None
,例如
from nltk.corpus import wordnet as wn
sl1 = wn.synsets("dog")
sl2 = wn.synsets("cat")
for x in sl1:
for y in sl2:
print(x, y, x.wup_similarity(y))
[OUT]:
Synset('dog.n.01') Synset('cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('guy.n.01') 0.631578947368421
Synset('dog.n.01') Synset('cat.n.03') 0.631578947368421
Synset('dog.n.01') Synset('kat.n.01') 0.25
Synset('dog.n.01') Synset('cat-o'-nine-tails.n.01') 0.42105263157894735
Synset('dog.n.01') Synset('caterpillar.n.02') 0.4
Synset('dog.n.01') Synset('big_cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('computerized_tomography.n.01') 0.1
Synset('dog.n.01') Synset('cat.v.01') None
Synset('dog.n.01') Synset('vomit.v.01') None
Synset('frump.n.01') Synset('cat.n.01') 0.48
Synset('frump.n.01') Synset('guy.n.01') 0.5714285714285714
Synset('frump.n.01') Synset('cat.n.03') 0.5714285714285714
Synset('frump.n.01') Synset('kat.n.01') 0.4
...
您只需检查无并将其指定为零,例如
sl1 = wn.synsets("dog")
sl2 = wn.synsets("cat")
for x in sl1:
for y in sl2:
score = x.wup_similarity(y)
score = score if score else 0
print(x, y, score)
[OUT]:
Synset('dog.n.01') Synset('cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('guy.n.01') 0.631578947368421
Synset('dog.n.01') Synset('cat.n.03') 0.631578947368421
Synset('dog.n.01') Synset('kat.n.01') 0.25
Synset('dog.n.01') Synset('cat-o'-nine-tails.n.01') 0.42105263157894735
Synset('dog.n.01') Synset('caterpillar.n.02') 0.4
Synset('dog.n.01') Synset('big_cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('computerized_tomography.n.01') 0.1
Synset('dog.n.01') Synset('cat.v.01') 0
Synset('dog.n.01') Synset('vomit.v.01') 0
Synset('frump.n.01') Synset('cat.n.01') 0.48
Synset('frump.n.01') Synset('guy.n.01') 0.5714285714285714
Synset('frump.n.01') Synset('cat.n.03') 0.5714285714285714
Synset('frump.n.01') Synset('kat.n.01') 0.4