一些同形分数介于0和1之间,例如最短路径和WuP。因此,汽车和汽车之间的相似性将为1,但其他措施如LCh将是
lch( car, automobile ) = 3.6889
我想知道这些措施的最高分。 3.6889是否被认为是最大值?这是否意味着LCH得分在0到3.6889之间。
我还添加了以下措施
jcn( car, automobile ) = 12876699.5
res( car, automobile ) = 9.3679
lesk( car, automobile ) = 9519
答案 0 :(得分:3)
似乎3.6375861597263857是lch_similarity
的最大值(我不能得到3.6889 ......)。根据{{3}},lch_similarity
具有以下属性:
Leacock Chodorow Similarity:
Return a score denoting how similar two word senses are, based on the
shortest path that connects the senses (as above) and the maximum depth
of the taxonomy in which the senses occur. The relationship is given as
-log(p/2d) where p is the shortest path length and d is the taxonomy
depth.
...
:return: A score denoting the similarity of the two ``Synset`` objects,
normally greater than 0. None is returned if no connecting path
could be found. If a ``Synset`` is compared with itself, the
maximum score is returned, which varies depending on the taxonomy
depth.
鉴于rock_hind.n.01
处于WordNet分类中最深层次(19)且change.n.06
处于最浅层(2),我们可以尝试不同的深度:
>>> from nltk.corpus import wordnet as wn
>>> rock = wn.synset('rock_hind.n.01')
>>> change = wn.synset('change.n.06')
>>> rock.lch_similarity(rock)
3.6375861597263857
>>> change.lch_similarity(change)
3.6375861597263857
>>> change.lch_similarity(rock)
0.7472144018302211
>>> rock.lch_similarity(change)
0.7472144018302211
可以对其他测量进行类似的实验,其中范围看起来要大得多:
>>> from nltk.corpus import wordnet_ic, genesis
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
>>> genesis_ic = wn.ic(genesis, False, 0.0)
>>> rock.res_similarity(rock, brown_ic) # res_similarity, brown
1e+300
>>> rock.res_similarity(change, brown_ic)
-0.0
>>> rock.res_similarity(rock, semcor_ic) # res_similarity, semcor
1e+300
>>> rock.res_similarity(change, semcor_ic)
-0.0
>>> rock.res_similarity(rock, genesis_ic) # res_similarity, genesis
1e+300
>>> rock.res_similarity(change, genesis_ic)
-0.08306855877006339
>>> change.res_similarity(rock, genesis_ic)
-0.08306855877006339
>>> rock.jcn_similarity(rock, brown_ic) # jcn, brown - results are identical with semcor and genesis
1e+300
>>> rock.jcn_similarity(change, brown_ic)
1e-300
>>> change.jcn_similarity(rock, brown_ic)
1e-300