我遇到了问题,正在寻求帮助,我确实有以下代码:
import nltk
import pandas as pd
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
d = {'col1': ['AI is our friend and it has been friendly', 'AI and human have always been friendly']}
df = pd.DataFrame(data=d)
sample_lst = []
for q in df['col1']:
nltk_tokens = nltk.word_tokenize(q)
for w in nltk_tokens:
sample_lst.append(wordnet_lemmatizer.lemmatize(w, pos='v'))
print(sample_lst)
代码可以正常工作,并将wordnet_lemmatizer.lemmatize追加到列表中,但是,我想将结果保存在CSV文件中,就像这样原始输入旁边
Col1 Col2
AI is our friend and it has been friendly IA be our friend and it have be friendly
AI and humans have always been friendly AI and humans have always be friendly
我试图做一个''.join(),但是结果不是我所期望的,关于如何重新加入该句子并将其添加到新列中的任何想法,谢谢。
答案 0 :(得分:1)
使用:
#create list for all values
out = []
for q in df['col1']:
#create list for each value
sample_lst = []
nltk_tokens = nltk.word_tokenize(q)
for w in nltk_tokens:
sample_lst.append(wordnet_lemmatizer.lemmatize(w, pos='v'))
#join lists by space
out.append(' '.join(sample_lst))
df['Col2'] = out
print (df)
col1 \
0 AI is our friend and it has been friendly
1 AI and human have always been friendly
Col2
0 AI be our friend and it have be friendly
1 AI and human have always be friendly
具有嵌套列表理解的另一种解决方案:
df['Col2'] = [' '.join(wordnet_lemmatizer.lemmatize(w, pos='v')
for w in nltk.word_tokenize(q))
for q in df['col1']]