计算WordNet中的距离

时间:2018-06-01 15:40:26

标签: python nltk wordnet

使用Python 3.5,NLTK和WordNet(最新版本),我正在计算单词的所有synset对之间的wup_similarity();#34; continuous"和"正在进行"。每个距离都是"无"尽管这两个词的简单英语含义看似相似。我还注意到这些词的词性是"形容词"和"形容词卫星"分别。

由于词性不同,计算是否失败? 有没有办法以某种方式将这两个部分组合起来以规避问题?

先谢谢你。我的代码片段如下所示。

import nltk
from nltk.corpus import wordnet as wn
sl1 = wn.synsets("continuous")
sl2 = wn.synsets("ongoing")
for x in sl1:
    for y in sl2:
        print(x, y, x.wup_similarity(y))

1 个答案:

答案 0 :(得分:0)

不应该出现任何“失败”,就像提出的错误一样。

然而,它会返回None,例如

from nltk.corpus import wordnet as wn
sl1 = wn.synsets("dog")
sl2 = wn.synsets("cat")
for x in sl1:
    for y in sl2:
        print(x, y, x.wup_similarity(y))

[OUT]:

Synset('dog.n.01') Synset('cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('guy.n.01') 0.631578947368421
Synset('dog.n.01') Synset('cat.n.03') 0.631578947368421
Synset('dog.n.01') Synset('kat.n.01') 0.25
Synset('dog.n.01') Synset('cat-o'-nine-tails.n.01') 0.42105263157894735
Synset('dog.n.01') Synset('caterpillar.n.02') 0.4
Synset('dog.n.01') Synset('big_cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('computerized_tomography.n.01') 0.1
Synset('dog.n.01') Synset('cat.v.01') None
Synset('dog.n.01') Synset('vomit.v.01') None
Synset('frump.n.01') Synset('cat.n.01') 0.48
Synset('frump.n.01') Synset('guy.n.01') 0.5714285714285714
Synset('frump.n.01') Synset('cat.n.03') 0.5714285714285714
Synset('frump.n.01') Synset('kat.n.01') 0.4
...

您只需检查无并将其指定为零,例如

sl1 = wn.synsets("dog")
sl2 = wn.synsets("cat")
for x in sl1:
    for y in sl2:
        score = x.wup_similarity(y)
        score = score if score else 0
        print(x, y, score)

[OUT]:

Synset('dog.n.01') Synset('cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('guy.n.01') 0.631578947368421
Synset('dog.n.01') Synset('cat.n.03') 0.631578947368421
Synset('dog.n.01') Synset('kat.n.01') 0.25
Synset('dog.n.01') Synset('cat-o'-nine-tails.n.01') 0.42105263157894735
Synset('dog.n.01') Synset('caterpillar.n.02') 0.4
Synset('dog.n.01') Synset('big_cat.n.01') 0.8571428571428571
Synset('dog.n.01') Synset('computerized_tomography.n.01') 0.1
Synset('dog.n.01') Synset('cat.v.01') 0
Synset('dog.n.01') Synset('vomit.v.01') 0
Synset('frump.n.01') Synset('cat.n.01') 0.48
Synset('frump.n.01') Synset('guy.n.01') 0.5714285714285714
Synset('frump.n.01') Synset('cat.n.03') 0.5714285714285714
Synset('frump.n.01') Synset('kat.n.01') 0.4