如果我在python中有一个单词列表,例如:
words = ["blue", "red", "ball"]
有没有办法以编程方式使用WordNet为这组单词产生上位词?
答案 0 :(得分:1)
首先,请参阅https://stackoverflow.com/a/29478711/610569以了解" sense"之间的区别。 (synset / concept)vs" words" (在wordnet,lemmas的背景下)。
给定两个同义词( NOT words ),可以找到它们之间最低的常见上位词:
>>> from nltk.corpus import wordnet as wn
# A word can represent multiple meaning (aka synsets)
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('cat')
[Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')]
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('cat')[0].definition()
u'feline mammal usually having thick soft fur and no ability to roar: domestic cats; wildcats'
>>> dog = wn.synsets('dog')[0]
>>> cat = wn.synsets('cat')[0]
>>> cat.lowest_common_hypernyms(dog)
[Synset('carnivore.n.01')]
请参阅http://www.nltk.org/howto/wordnet_lch.html
最低常见的上位词是否可靠?
Wordnet是一个手工制作的资源,所以它的可靠程度取决于在整个WordNet本体中创建synset的原因和方式
我可以将此信息用于我的NLP任务吗?
也许......但很可能,它没用。
可以比较超过2个同义词吗?
不完全是。您必须进行多个成对搜索,例如
>>> mouse = wn.synsets('mouse')[0]
>>> cat = wn.synsets('cat')[0]
>>> dog = wn.synsets('dog')[0]
>>> dog.lowest_common_hypernyms(cat)
[Synset('carnivore.n.01')]
>>> cat.lowest_common_hypernyms(mouse)
[Synset('placental.n.01')]
>>> dog.lowest_common_hypernyms(mouse)
[Synset('placental.n.01')]
>>> placental = dog.lowest_common_hypernyms(mouse)[0]
>>> carnivore = dog.lowest_common_hypernyms(cat)[0]
>>> placental.lowest_common_hypernyms(carnivore)
[Synset('placental.n.01')]
但你可以看到效率低下。因此,如果您重写自己的代码来遍历WordNet本体并找到N no的最低常见上位词,那么这样会更容易。同义词而不是成对的。