应用标记化之后,我有一个熊猫数据框,如下所示。我想在此数据框中应用nltk lemmatizer。我试过的是在这里给。我收到错误消息,提示“ if form in exceptions:TypeError:unhashable type:'list'”。如何在这里正确实现lemmatizer?
还请注意,第5个数据帧单元格有一个空列表。如何在此数据框中删除此类列表?
[[ive, searching, right, words, thank, breather], [i, promise, wont, take, help, granted, fulfil, promise], [you, wonderful, blessing, times]]
[[free, entry, 2, wkly, comp, win, fa, cup, final, tkts, 21st, may, 2005], [text, fa, 87121, receive, entry, questionstd, txt, ratetcs, apply, 08452810075over18s]]
[[nah, dont, think, goes, usf, lives, around, though]]
[[even, brother, like, speak, me], [they, treat, like, aids, patent]]
[[i, date, sunday, will], []]
我尝试过的lemmatizer函数
def lemmatize(fullCorpus):
lemmatizer = nltk.stem.WordNetLemmatizer()
lemmatized = fullCorpus['tokenized'].apply(lambda row: list(map([lemmatizer.lemmatize(y) for y in row])))
return lemmatized
答案 0 :(得分:1)
您可以尝试以下操作:
def lemmatize(fullCorpus):
lemmatizer = nltk.stem.WordNetLemmatizer()
lemmatized = fullCorpus['tokenized'].apply(
lambda row: list(list(map(lemmatizer.lemmatize,y)) for y in row))
return lemmatized