NLTK Python中的词义歧义消歧

时间:2010-09-13 11:04:27

标签: python nltk

我是NLTK Python的新手,我正在寻找一些可以进行词义消歧的示例应用程序。我在搜索结果中有很多算法,但没有示例应用程序。我只想传递一个句子,并希望通过引用wordnet库来了解每个单词的含义。 感谢

我在PERL中找到了类似的模块。 http://marimba.d.umn.edu/allwords/allwords.html NLTK Python中是否存在此类模块?

6 个答案:

答案 0 :(得分:12)

最近,部分pywsd代码已被移植到NLTK'的最新版本中。在wsd.py模块中,尝试:

>>> from nltk.wsd import lesk
>>> sent = 'I went to the bank to deposit my money'
>>> ambiguous = 'bank'
>>> lesk(sent, ambiguous)
Synset('bank.v.04')
>>> lesk(sent, ambiguous).definition()
u'act as the banker in a game or in gambling'

为了获得更好的WSD性能,请使用pywsd库而不是NLTK模块。通常,simple_lesk()的{​​{1}}比pywsd的{​​{1}}好。我可以免费更新lesk模块。


在回应Chris Spencer的评论时,请注意Lesk算法的局限性。我只是简单地给出了算法的准确实现。它不是一颗银弹,http://en.wikipedia.org/wiki/Lesk_algorithm

另请注意,尽管:

NLTK

请勿给出正确答案,您可以使用NLTK lesk("My cat likes to eat mice.", "cat", "n") 的实施方式:

pywsd

@Chris,如果你想要一个python setup.py,只是做一个礼貌的请求,我会写它...

答案 1 :(得分:7)

答案 2 :(得分:7)

是的,事实上,NLTK小组编写的a book有多个分类章节,并明确涵盖how to use WordNet。您也可以从Safari购买该书的实体版本。

仅供参考:NLTK由自然语言程序设计学者编写,用于他们的入门编程课程。

答案 3 :(得分:3)

作为OP请求的实际答案,这里是几个WSD方法的python实现,它们以NLTK的synset(s)形式返回感官,https://github.com/alvations/pywsd

它包括

  • Lesk 算法(包括原始Lesk 改编Lesk 简单Lesk
  • 基线算法(随机感,第一感,最常见感)

可以这样使用:

#!/usr/bin/env python -*- coding: utf-8 -*-

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']

plant_sents = ['The workers at the industrial plant were overworked',
'The plant was no longer bearing flowers']

print "======== TESTING simple_lesk ===========\n"
from lesk import simple_lesk
print "#TESTING simple_lesk() ..."
print "Context:", bank_sents[0]
answer = simple_lesk(bank_sents[0],'bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS ..."
print "Context:", bank_sents[1]
answer = simple_lesk(bank_sents[1],'bank','n')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING simple_lesk() with POS and stems ..."
print "Context:", plant_sents[0]
answer = simple_lesk(plant_sents[0],'plant','n', True)
print "Sense:", answer
print "Definition:",answer.definition
print

print "======== TESTING baseline ===========\n"
from baseline import random_sense, first_sense
from baseline import max_lemma_count as most_frequent_sense

print "#TESTING random_sense() ..."
print "Context:", bank_sents[0]
answer = random_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING first_sense() ..."
print "Context:", bank_sents[0]
answer = first_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "#TESTING most_frequent_sense() ..."
print "Context:", bank_sents[0]
answer = most_frequent_sense('bank')
print "Sense:", answer
print "Definition:",answer.definition
print

<强> [OUT]:

======== TESTING simple_lesk ===========

#TESTING simple_lesk() ...
Context: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition: a financial institution that accepts deposits and channels the money into lending activities

#TESTING simple_lesk() with POS ...
Context: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING simple_lesk() with POS and stems ...
Context: The workers at the industrial plant were overworked
Sense: Synset('plant.n.01')
Definition: buildings for carrying on industrial labor

======== TESTING baseline ===========
#TESTING random_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('deposit.v.02')
Definition: put into a bank account

#TESTING first_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

#TESTING most_frequent_sense() ...
Context: I went to the bank to deposit my money
Sense: Synset('bank.n.01')
Definition: sloping land (especially the slope beside a body of water)

答案 4 :(得分:0)

NLTK有访问Wordnet的api。 Wordnet将单词作为同义词放置。这会给你一些关于这个词,它的上位词,下位词,根词等的信息。

“使用NLTK 2.0 Cookbook进行Python文本处理”是一本很好的书,可以帮助您开始使用NLTK的各种功能。它易于阅读,理解和实施。

另外,你可以看一下其他论文(在NLTK领域之外),讨论使用维基百科进行词义消歧。

答案 5 :(得分:-1)

是的,可以使用NLTK中的wordnet模块。 您帖子中提到的工具中使用的相似性信息也存在于NLTK wordnet模块中。