Nltk的wordnet lemmatizer并没有将所有单词变形

时间:2017-07-29 01:11:32

标签: python nlp nltk wordnet lemmatization

我试图在文本中对词汇进行词形推理。例如' pickle'应该转向' pickle''跑步'去葡萄干'葡萄干'去葡萄干'等

我使用nltk' s WordNet Lemmatizer如下:

from nltk.stem import WordNetLemmatizer
>>> 
>>> lem = WordNetLemmatizer()
>>> print(lem.lemmatize("cats"))
cat
>>> print(lem.lemmatize("pickled"))
pickled
>>> print(lem.lemmatize("ran"))
ran

因此,您可以看到'pickled''ran',但输出并未按预期显示。如何在不必为单词指定'pickle'(动词)等的情况下获取'run''v'

1 个答案:

答案 0 :(得分:2)

您可以通过传递lemmatize()'v'参数并且不传递任何内容来获取函数的最常见结果,从而获得名词或动词的'n'函数的基本形式。

不是直接的方法,但您可以尝试以下代码来获取名词或动词的基本形式:

def most_common(lst):
    return max(set(lst), key=lst.count)
words = ['ran','pickled','cats',"crying","died","raisins","had"]
for word in words:
    checkList=[WordNetLemmatizer().lemmatize(word,'v'),WordNetLemmatizer().lemmatize(word,'n'),WordNetLemmatizer().lemmatize(word,'n')]
    print most_common(checkList)

你得到基本形式:

ran
pickled
cat
cry
died
raisin
had