对nltk来说还不错。我试图找到一个解决方案 我目前正在处理的问题:
感谢。
答案 0 :(得分:3)
还可以找到包含给定的sysnet列表 字?
是强>:
>>> from nltk.corpus import wordnet as wn
>>> auto, car = 'auto', 'car'
>>> wn.synsets(auto)
[Synset('car.n.01')]
>>> wn.synsets(car)
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]
如果我们查看来自wn.synsets(car)
的每个synset中的lemmas,我们会发现“car”存在作为其中一个引理:
>>> for ss in wn.synsets(car):
... assert 'car' in ss.lemma_names()
...
>>> for ss in wn.synsets(car):
... print 'car' in ss.lemma_names(), ss.lemma_names()
...
True [u'car', u'auto', u'automobile', u'machine', u'motorcar']
True [u'car', u'railcar', u'railway_car', u'railroad_car']
True [u'car', u'gondola']
True [u'car', u'elevator_car']
True [u'cable_car', u'car']
注意:引理并不是一个表面词,请参阅Stemmers vs Lemmatizers,您也可能会发现这有用https://github.com/alvations/pywsd/blob/master/pywsd/utils.py#L66(免责声明:无耻插件)
鉴于两个词w1和w2有一种方法可以找出它们是否存在 属于Wordnet数据库中的同一个sysnet?
是强>:
>>> from nltk.corpus import wordnet as wn
>>> auto, car = 'auto', 'car'
>>> wn.synsets(auto)
[Synset('car.n.01')]
>>> wn.synsets(car)
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]
>>> auto_ss = set(wn.synsets(auto))
>>> car_ss = set(wn.synsets(car))
>>> car_ss.intersection(auto_ss)
set([Synset('car.n.01')])