Question

我发现将参数传递给WordNetLemmatizer()

的方式有所不同

当我跑步时：

m=[('recurrances', 'NNS')]

wnl = WordNetLemmatizer()

print '>>>>', wnl.lemmatize(m[0][0], 'n')

结果是＆＃34; reurrances＆＃34;，但是当我跑：

print '>>>>', wnl.lemmatize('recurrances', 'n')

结果是我所期望的＆gt;＆gt; ＆＃34;复发时＆＃34;

为什么会这样？在第一种情况下，有没有办法得到正确的结果（即单数形式）？

Answer 1

我无法重现你在传递m [0] [0]或直接传递字符串之间所说的差异（如果有的话，它会让我感到惊讶，这可能是由于代码中的其他内容）。

事实上，它没有返回你所期望的，可能是因为wordnet没有认识到你正在喂它的lemmatizer这个词。不太理想的解决方案，但你可以解决这个问题，正如@alvas建议的那样，使用一个词干分析器。有点基础，但也更健壮。如下所示：

from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from nltk.stem import SnowballStemmer
snowball_stemmer = SnowballStemmer('english')

m=[('recurrances', 'NNS')]
wnl = WordNetLemmatizer()
print(wnl.lemmatize(m[0][0], pos='n'))
print(wnl.lemmatize('recurrances', pos='n'))

word = m[0][0]
if wordnet.synsets(word):
    lemma = wnl.lemmatize(word, pos='n') # you might want to do pos-tagging if you have the whole sentence to not always pass it pos-tag n...
    print('Lemma:', lemma)
else:
    stem = snowball_stemmer.stem(word)
    print('Stem:', stem)

输出（如您所见，前两行相同）：

recurrances
recurrances
Stem: recurr

使用元组列表时WordNetLemmatizer（）的问题

1 个答案: