标记一个句子并在Python中重新加入结果

时间:2019-01-24 06:19:16

标签: python pandas nltk

我遇到了问题,正在寻求帮助,我确实有以下代码:

import nltk
import pandas as pd
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

d = {'col1': ['AI is our friend and it has been friendly', 'AI and human have always been friendly']}
df = pd.DataFrame(data=d)

sample_lst = []
for q in df['col1']:

   nltk_tokens = nltk.word_tokenize(q)
   for w in nltk_tokens:
          sample_lst.append(wordnet_lemmatizer.lemmatize(w, pos='v'))
          print(sample_lst)

代码可以正常工作,并将wordnet_lemmatizer.lemmatize追加到列表中,但是,我想将结果保存在CSV文件中,就像这样原始输入旁边

Col1                                        Col2
AI is our friend and it has been friendly   IA be our friend and it have be friendly
AI and humans have always been friendly     AI and humans have always be friendly

我试图做一个''.join(),但是结果不是我所期望的,关于如何重新加入该句子并将其添加到新列中的任何想法,谢谢。

1 个答案:

答案 0 :(得分:1)

使用:

#create list for all values
out = []
for q in df['col1']:
   #create list for each value
   sample_lst = []
   nltk_tokens = nltk.word_tokenize(q)
   for w in nltk_tokens:
          sample_lst.append(wordnet_lemmatizer.lemmatize(w, pos='v'))
   #join lists by space
   out.append(' '.join(sample_lst))

df['Col2'] = out
print (df)
                                        col1  \
0  AI is our friend and it has been friendly   
1     AI and human have always been friendly   

                                       Col2  
0  AI be our friend and it have be friendly  
1      AI and human have always be friendly  

具有嵌套列表理解的另一种解决方案:

df['Col2'] = [' '.join(wordnet_lemmatizer.lemmatize(w, pos='v') 
              for w in nltk.word_tokenize(q)) 
              for q in df['col1']]