我正在对存储在pandas中的列中的语料库使用NLTK的SentimentIntensityAnalyzer()。使用.polarity_scores()返回一个包含四个键及其值的字典,neg,neu,pos和compound。
我想迭代数据帧中的每一行,计算join_corpus ['body]中包含的语料库的极性分数,并将生成的字典解压缩到数据帧中的四列。我想不出办法将多个key:value对解压缩到pandas中的一列,所以我不得不使用以下for循环:
for index, row in joined_corpus.iterrows():
sentiment = sid.polarity_scores(row['body'])
joined_corpus.loc[index, 'neg'] = sentiment['neg']
joined_corpus.loc[index, 'neu'] = sentiment['neu']
joined_corpus.loc[index, 'pos'] = sentiment['pos']
joined_corpus.loc[index, 'compound'] = sentiment['pos']
print("sentiment calculated for "+ row['subreddit'] + "of" + str(sentiment))
这会产生如下输出:
sentiment calculated for 1200isplentyof{'neg': 0.067, 'neu': 0.745, 'pos': 0.188, 'compound': 1.0}
sentiment calculated for 2007scapeof{'neg': 0.092, 'neu': 0.77, 'pos': 0.138, 'compound': 0.9998}
sentiment calculated for 2b2tof{'neg': 0.123, 'neu': 0.768, 'pos': 0.109, 'compound': -0.9981}
sentiment calculated for 2healthbarsof{'neg': 0.096, 'neu': 0.762, 'pos': 0.142, 'compound': 0.9994}
sentiment calculated for 2meirl4meirlof{'neg': 0.12, 'neu': 0.709, 'pos': 0.171, 'compound': 0.9997}
sentiment calculated for 3DSof{'neg': 0.054, 'neu': 0.745, 'pos': 0.201, 'compound': 1.0}
sentiment calculated for 3Dprintingof{'neg': 0.056, 'neu': 0.812, 'pos': 0.131, 'compound': 1.0}
sentiment calculated for 3dshacksof{'neg': 0.055, 'neu': 0.804, 'pos': 0.141, 'compound': 1.0}
sentiment calculated for 40kLoreof{'neg': 0.123, 'neu': 0.747, 'pos': 0.13, 'compound': 0.9545}
sentiment calculated for 49ersof{'neg': 0.098, 'neu': 0.715, 'pos': 0.187, 'compound': 1.0}
然而,显然,这很慢,因为它不使用pandas内置的apply命令。在这种情况下,有没有办法避免循环?
答案 0 :(得分:1)
使用apply
sentiment = df['body'].apply(lambda x : sid.polarity_scores(x))
df=pd.concat([df,sentiment.apply(pd.Series)],1)
然后,
"sentiment calculated for "+df['subreddit']+'of'+ sentiment.astype(str)
答案 1 :(得分:1)
您可以使用列表推导:
<ExcludeXmlAssemblyFiles>false</ExcludeXmlAssemblyFiles>
您还可以直接从此列表中创建一个系列:
res = [sid.polarity_scores(x) for x in df['body']]
for item in res:
print(res)