我试图在文本中对词汇进行词形推理。例如' pickle'应该转向' pickle''跑步'去葡萄干'葡萄干'去葡萄干'等
我使用nltk' s WordNet Lemmatizer
如下:
from nltk.stem import WordNetLemmatizer
>>>
>>> lem = WordNetLemmatizer()
>>> print(lem.lemmatize("cats"))
cat
>>> print(lem.lemmatize("pickled"))
pickled
>>> print(lem.lemmatize("ran"))
ran
因此,您可以看到'pickled'
和'ran'
,但输出并未按预期显示。如何在不必为单词指定'pickle'
(动词)等的情况下获取'run'
和'v'
。
答案 0 :(得分:2)
您可以通过传递lemmatize()
或'v'
参数并且不传递任何内容来获取函数的最常见结果,从而获得名词或动词的'n'
函数的基本形式。
不是直接的方法,但您可以尝试以下代码来获取名词或动词的基本形式:
def most_common(lst):
return max(set(lst), key=lst.count)
words = ['ran','pickled','cats',"crying","died","raisins","had"]
for word in words:
checkList=[WordNetLemmatizer().lemmatize(word,'v'),WordNetLemmatizer().lemmatize(word,'n'),WordNetLemmatizer().lemmatize(word,'n')]
print most_common(checkList)
你得到基本形式:
ran
pickled
cat
cry
died
raisin
had