如何在使用NLTK时确定语义层次/关系?

时间:2013-03-25 21:53:10

标签: python semantics nltk wordnet

我想用NLTK和wordnet来理解两个单词之间的语义关系。就像我输入“员工”和“服务员”一样,它会返回显示员工比服务员更为一般的东西。或者对于“员工”和“工人”,它返回相等。有谁知道怎么做?

1 个答案:

答案 0 :(得分:6)

首先,你必须解决将单词变成lemmas然后进入Synsets的问题,即如何从单词中识别出一个synset?

word => lemma => lemma.pos.sense => synset    
Waiters => waiter => 'waiter.n.01' => wn.Synset('waiter.n.01')

因此,假设您已经处理了上述问题并获得waiter的最正确表示,那么您可以继续比较同义词。请注意,一个单词可以有很多同义词

from nltk.corpus import wordnet as wn
waiter = wn.Synset('waiter.n.01')
employee = wn.Synset('employee.n.01')

all_hyponyms_of_waiter = list(set([w.replace("_"," ") for s in waiter.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))
all_hyponyms_of_employee = list(set([w.replace("_"," ") for s in employee.closure(lambda s:s.hyponyms()) for w in s.lemma_names]))

if 'waiter' in all_hyponyms_of_employee:
  print 'employee more general than waiter'
elif 'employee' in all_hyponyms_of_waiter:
  print 'waiter more general than employee'
else:
  print "The SUMO ontology used in wordnet just doesn't have employee or waiter under the same tree"