如何从原始文本中获取正确的同义词?

时间:2015-01-18 23:18:21

标签: nltk wordnet lemmatization pos-tagger

我需要使用path_similarity方法创建一个表,其中包含来自任何原始文本的单词(synsets)之间的关系。

>>> from nltk.corpus import wordnet as wn
>>> sent = "I went to the bank to deposit money".split()
>>> wn.synsets('bank')
[Synset('bank.n.01'), Synset('depository_financial_institution.n.01'), Synset('bank.n.03'), Synset('bank.n.04'), Synset('bank.n.05'), Synset('bank.n.06'), Synset('bank.n.07'), Synset('savings_bank.n.02'), Synset('bank.n.09'), Synset('bank.n.10'), Synset('bank.v.01'), Synset('bank.v.02'), Synset('bank.v.03'), Synset('bank.v.04'), Synset('bank.v.05'), Synset('deposit.v.02'), Synset('bank.v.07'), Synset('trust.v.01')]

如何从原始文本中为每个单词获取正确的synset?

我可以这样得到lemmas和POS标签:

>>> from nltk import pos_tag
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('banks')
u'bank'
>>> pos_tag(['banks'])
[('banks', 'NNS')]

但我如何获得正确的同义词/感应号?

0 个答案:

没有答案