假设我有一个单词A和一个单词B,其中我用B作为提示意味着A的意思。例如,A = bass,B =音乐,给定这个单词对,作为人类,我们可以马上知道A这个词是什么意思。
我知道有很多算法适用于句子。我想知道是否已经开发出仅针对一对单词进行WSD的算法。
答案 0 :(得分:7)
Word Sense Disambiguation(WSD)是消除给定上下文句子/文档的单词的任务。在两个令牌短语的情况下,上下文基本上是另一个令牌。
您可以试用不同的WSD软件,这里有一个列表:Anyone know of some good Word Sense Disambiguation software?
我将使用pywsd
(https://github.com/alvations/pywsd)为您举例:
$ wget https://github.com/alvations/pywsd/archive/master.zip
$ unzip master.zip
$ cd pywsd-master
$ python
Python 2.7.5+ (default, Feb 27 2014, 19:37:08)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lesk import simple_lesk
# disambiguating the word 'bass' given the context 'bass music'
>>> simple_lesk('bass music', 'bass')
Synset('bass.n.07')
>>> disambiguated = simple_lesk('bass music', 'bass')
>>> disambiguated.definition
<bound method Synset.definition of Synset('bass.n.07')>
>>> disambiguated.definition()
u'the member with the lowest range of a family of musical instruments
或者,您可以在NLTK
(https://github.com/nltk/nltk/blob/develop/nltk/wsd.py)中使用新模块,因为您有最新版本:
from nltk.wsd import lesk
disambiguated = lesk(context_sentence="bass music", ambiguous_word="bass")
print disambiguated.definition()
(免责声明:我在pywsd
编写了lesk
和NLTK
模块