我需要使用path_similarity方法创建一个表,其中包含来自任何原始文本的单词(synsets)之间的关系。
>>> from nltk.corpus import wordnet as wn
>>> sent = "I went to the bank to deposit money".split()
>>> wn.synsets('bank')
[Synset('bank.n.01'), Synset('depository_financial_institution.n.01'), Synset('bank.n.03'), Synset('bank.n.04'), Synset('bank.n.05'), Synset('bank.n.06'), Synset('bank.n.07'), Synset('savings_bank.n.02'), Synset('bank.n.09'), Synset('bank.n.10'), Synset('bank.v.01'), Synset('bank.v.02'), Synset('bank.v.03'), Synset('bank.v.04'), Synset('bank.v.05'), Synset('deposit.v.02'), Synset('bank.v.07'), Synset('trust.v.01')]
如何从原始文本中为每个单词获取正确的synset?
我可以这样得到lemmas和POS标签:
>>> from nltk import pos_tag
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> wnl.lemmatize('banks')
u'bank'
>>> pos_tag(['banks'])
[('banks', 'NNS')]
但我如何获得正确的同义词/感应号?