我知道wordnet有一个"adverb synset" type。我知道这是nltk中的synset类型枚举
from nltk.corpus import wordnet as wn
wn.ADJ_SAT
u's'
为什么我不能将它作为同义词的关键字传递给我?
>>> wn.synsets('dog', wn.ADJ_SAT)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1413, in synsets
for form in self._morphy(lemma, p)
File "/Library/Python/2.7/site-packages/nltk/corpus/reader/wordnet.py", line 1627, in _morphy
substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
KeyError: u's'
答案 0 :(得分:1)
自:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('able')
[Synset('able.a.01'), Synset('able.s.02'), Synset('able.s.03'), Synset('able.s.04')]
>>> wn.synsets('able', pos=wn.ADJ)
[Synset('able.a.01'), Synset('able.s.02'), Synset('able.s.03'), Synset('able.s.04')]
>>> wn.synsets('able', pos=wn.ADJ_SAT)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1413, in synsets
for form in self._morphy(lemma, p)
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/wordnet.py", line 1627, in _morphy
substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
KeyError: u's'
从https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1397,我们看到当您尝试从NLTK
wordnet API检索同义词集时,POS限制会出现在调用self._morphy(lemma, p)
函数的返回列表解析中:< / p>
def synsets(self, lemma, pos=None, lang='en'):
"""Load all synsets with a given lemma and part of speech tag.
If no pos is specified, all synsets for all parts of speech
will be loaded.
If lang is specified, all the synsets associated with the lemma name
of that language will be returned.
"""
lemma = lemma.lower()
if lang == 'en':
get_synset = self._synset_from_pos_and_offset
index = self._lemma_pos_offset_map
if pos is None:
pos = POS_LIST
return [get_synset(p, offset)
for p in pos
for form in self._morphy(lemma, p)
for offset in index[form].get(p, [])]
如果我们从https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1573查看_morphy()
函数。
def _morphy(self, form, pos):
# from jordanbg:
# Given an original string x
# 1. Apply rules once to the input to get y1, y2, y3, etc.
# 2. Return all that are in the database
# 3. If there are no matches, keep applying rules until you either
# find a match or you can't go any further
exceptions = self._exception_map[pos]
substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
def apply_rules(forms):
return [form[:-len(old)] + new
for form in forms
for old, new in substitutions
if form.endswith(old)]
def filter_forms(forms):
result = []
seen = set()
for form in forms:
if form in self._lemma_pos_offset_map:
if pos in self._lemma_pos_offset_map[form]:
if form not in seen:
result.append(form)
seen.add(form)
return result
# 0. Check the exception lists
if form in exceptions:
return filter_forms([form] + exceptions[form])
# 1. Apply rules once to the input to get y1, y2, y3, etc.
forms = apply_rules([form])
# 2. Return all that are in the database (and check the original too)
results = filter_forms([form] + forms)
if results:
return results
# 3. If there are no matches, keep applying rules until we find a match
while forms:
forms = apply_rules(forms)
results = filter_forms(forms)
if results:
return results
# Return an empty list if we can't find anything
return []
我们看到它从substitutions = self.MORPHOLOGICAL_SUBSTITUTIONS[pos]
检索一些替换规则,以便在检索以“基础”/“根”形式存储的Synsets之前执行某些形态缩减。 E.g。
>>> from nltk.corpus import wordnet as wn
>>> wn._morphy('dogs', 'n')
[u'dog']
如果我们查看MORPHOLOGICAL_SUBSTITUTIONS
,我们会看到ADJ_SAT
缺失,请参阅https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1609:
MORPHOLOGICAL_SUBSTITUTIONS = {
NOUN: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
('men', 'man'), ('ies', 'y')],
VERB: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
ADJ: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
ADV: []}
因此,要防止这种情况发生,请在https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1609的第1609行之后添加此行:
MORPHOLOGICAL_SUBSTITUTIONS[ADJ_SAT] = MORPHOLOGICAL_SUBSTITUTIONS[ADJ]
为了概念验证:
>>> MORPHOLOGICAL_SUBSTITUTIONS = {
... 1: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
... ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
... ('men', 'man'), ('ies', 'y')],
... 2: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
... ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
... 3: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
... 4: []}
>>>
>>> MORPHOLOGICAL_SUBSTITUTIONS[5] = MORPHOLOGICAL_SUBSTITUTIONS[3]
>>> MORPHOLOGICAL_SUBSTITUTIONS
{1: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'), ('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'), ('men', 'man'), ('ies', 'y')], 2: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''), ('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')], 3: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')], 4: [], 5: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')]}
答案 1 :(得分:1)