pattern.en wordnet有限吗?

时间:2016-01-22 13:18:29

标签: python nltk

我试图获取某些单词的同义词但发现某些单词会出错。这是代码。

from pattern.en import wordnet as wn

def foo():
    ss = 'man'    
    s = wn.synsets(ss)[0]
    name = [item for item in [str(x) for x in s.synonyms]] 
    print name
foo()

如果我尝试使用怀孕或丑陋等词语,我会收到错误:

IndexError: list index out of range

可能是什么问题?

1 个答案:

答案 0 :(得分:1)

似乎NLTK wordnet和Pattern wordnet界面之间存在某种差异:

$time = time() < strtotime('10:00am')
   ? strtotime('10:00am')
   : strtotime('tomorrow 10:00am');

检查官方princeton wordnet,有13个同义词集,请参阅http://wordnetweb.princeton.edu/perl/webwn?s=man&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=

检查>>> from nltk.corpus import wordnet as wn >>> from pattern.en import wordnet as pwn >>> wn.synsets('man') [Synset('man.n.01'), Synset('serviceman.n.01'), Synset('man.n.03'), Synset('homo.n.02'), Synset('man.n.05'), Synset('man.n.06'), Synset('valet.n.01'), Synset('man.n.08'), Synset('man.n.09'), Synset('man.n.10'), Synset('world.n.08'), Synset('man.v.01'), Synset('man.v.02')] >>> pwn.synsets('man') [Synset(u'man'), Synset(u'serviceman'), Synset(u'man'), Synset(u'homo'), Synset(u'man'), Synset(u'man'), Synset(u'valet'), Synset(u'man'), Synset(u'Man'), Synset(u'man'), Synset(u'world')] >>> len(wn.synsets('man')) 13 >>> len(pwn.synsets('man')) 11 代码,似乎与默认POS设置为&#39;名词&#39;有关。 (来自https://github.com/clips/pattern/blob/master/pattern/text/en/wordnet/init.py#L93)。

但是有一个&#34;陷阱&#34;对于POS参数,pattern库在字符串中不起作用:

pattern

现在我们找到了2个缺失的同义词。

问:pattern.en wordnet是否有限? 答:否。

>>> pwn.synsets('man', pos='n') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/pattern/text/en/wordnet/__init__.py", line 109, in synsets raise TypeError, "part of speech must be NOUN, VERB, ADJECTIVE or ADVERB, not %s" % repr(pos) TypeError: part of speech must be NOUN, VERB, ADJECTIVE or ADVERB, not 'n' >>> pwn.synsets('man', pos='NOUN') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/pattern/text/en/wordnet/__init__.py", line 109, in synsets raise TypeError, "part of speech must be NOUN, VERB, ADJECTIVE or ADVERB, not %s" % repr(pos) TypeError: part of speech must be NOUN, VERB, ADJECTIVE or ADVERB, not 'noun' >>> pwn.synsets('man', pos='nn') [Synset(u'man'), Synset(u'serviceman'), Synset(u'man'), Synset(u'homo'), Synset(u'man'), Synset(u'man'), Synset(u'valet'), Synset(u'man'), Synset(u'Man'), Synset(u'man'), Synset(u'world')] >>> pwn.synsets('man', pos='vb') [Synset(u'man'), Synset(u'man')] API

的注意事项

pattern中使用WordNet API时,如果POS不是名词,则需要指定POS,例如:

pattern

问:那我为什么会得到奇怪的IndexError?

A:鉴于上述检查,WordNet和Pattern正在使用相同的普林斯顿WordNet 3.0,因此不应该出现问题。下载/安装>>> from pattern.en import wordnet as wn >>> wn.synsets('pregnant', pos='jj') [Synset(u'pregnant'), Synset(u'meaning'), Synset(u'fraught')] >>> wn.synsets('pregnant') [] >>> wn.synsets('quickly', pos='rb') [Synset(u'quickly'), Synset(u'promptly'), Synset(u'cursorily')] >>> wn.synsets('quickly') [] >>> wn.synsets('run', pos='nn') [Synset(u'run'), Synset(u'test'), Synset(u'footrace'), Synset(u'streak'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'rivulet'), Synset(u'political campaign'), Synset(u'run'), Synset(u'discharge'), Synset(u'run'), Synset(u'run')] >>> wn.synsets('run', pos='vb') [Synset(u'run'), Synset(u'scat'), Synset(u'run'), Synset(u'operate'), Synset(u'run'), Synset(u'run'), Synset(u'function'), Synset(u'range'), Synset(u'campaign'), Synset(u'play'), Synset(u'run'), Synset(u'tend'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'prevail'), Synset(u'run'), Synset(u'run'), Synset(u'carry'), Synset(u'run'), Synset(u'guide'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'run'), Synset(u'ply'), Synset(u'hunt'), Synset(u'race'), Synset(u'move'), Synset(u'melt'), Synset(u'ladder'), Synset(u'run')] 时可能出现问题,请尝试重新安装:

pattern

问:对于wordnet访问,pip install -U pattern 是否比pattern更快?

答:针对速度问题,nltkpattern都将同义词存储为要提取的词典,因此我认为从词典中检索是等效的。加载nltknltk语料库时可能会有一些开销,所以我们的时间最长

wordnet