我想为定形结果附加一些例外。例如,当我测试wnl.lemmatize('cookies')
时,我得到的结果是cooky
而不是cookie
。如何将去词化结果更新为cookie
?
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.stem import WordNetLemmatizer
wnl = WordNetLemmatizer()
def text_cleaning(text):
text = text.lower()
tok_list = [wnl.lemmatize(w,tag[0].lower()) if tag[0].lower() in ['a','n','v'] else wnl.lemmatize(w) for w,tag in pos_tag(word_tokenize(text))]
return ' '.join(tok_list)
答案 0 :(得分:1)
仔细查看here的实现,您可能可以做类似
的操作class WNWrapper(WordNetLemmatizer):
def __init__(self, custom_transforms):
self.custom_transforms = custom_transforms
def lemmatize(self, word):
if word in self.custom_transforms:
return self.custom_transforms[word]
super().lemmatize(word)
但这仅在
时有效1)您知道要更改/不更改哪些词
2)这是一个很小的数字。这显然无法扩展