基于WordNet的最高得分相似度

时间:2013-11-21 04:57:40

标签: wordnet ws4py

一些同形分数介于0和1之间,例如最短路径和WuP。因此,汽车和汽车之间的相似性将为1,但其他措施如LCh将是

lch( car, automobile ) = 3.6889

我想知道这些措施的最高分。 3.6889是否被认为是最大值?这是否意味着LCH得分在0到3.6889之间。

我还添加了以下措施

jcn( car, automobile ) = 12876699.5
res( car, automobile ) = 9.3679
lesk( car, automobile ) = 9519 

1 个答案:

答案 0 :(得分:3)

似乎3.6375861597263857是lch_similarity的最大值(我不能得到3.6889 ......)。根据{{​​3}},lch_similarity具有以下属性:

Leacock Chodorow Similarity:
        Return a score denoting how similar two word senses are, based on the
        shortest path that connects the senses (as above) and the maximum depth
        of the taxonomy in which the senses occur. The relationship is given as
        -log(p/2d) where p is the shortest path length and d is the taxonomy
        depth.
...
:return: A score denoting the similarity of the two ``Synset`` objects,
            normally greater than 0. None is returned if no connecting path
            could be found. If a ``Synset`` is compared with itself, the
            maximum score is returned, which varies depending on the taxonomy
            depth.

鉴于rock_hind.n.01处于WordNet分类中最深层次(19)且change.n.06处于最浅层(2),我们可以尝试不同的深度:

>>> from nltk.corpus import wordnet as wn
>>> rock = wn.synset('rock_hind.n.01')
>>> change = wn.synset('change.n.06')
>>> rock.lch_similarity(rock)
3.6375861597263857
>>> change.lch_similarity(change)
3.6375861597263857
>>> change.lch_similarity(rock)
0.7472144018302211
>>> rock.lch_similarity(change)
0.7472144018302211

可以对其他测量进行类似的实验,其中范围看起来要大得多:

>>> from nltk.corpus import wordnet_ic, genesis
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
>>> genesis_ic = wn.ic(genesis, False, 0.0)
>>> rock.res_similarity(rock, brown_ic) # res_similarity, brown
1e+300
>>> rock.res_similarity(change, brown_ic)
-0.0
>>> rock.res_similarity(rock, semcor_ic) # res_similarity, semcor
1e+300
>>> rock.res_similarity(change, semcor_ic)
-0.0
>>> rock.res_similarity(rock, genesis_ic) # res_similarity, genesis
1e+300
>>> rock.res_similarity(change, genesis_ic)
-0.08306855877006339
>>> change.res_similarity(rock, genesis_ic)
-0.08306855877006339
>>> rock.jcn_similarity(rock, brown_ic) # jcn, brown - results are identical with semcor and genesis
1e+300
>>> rock.jcn_similarity(change, brown_ic)
1e-300
>>> change.jcn_similarity(rock, brown_ic)
1e-300