我正在尝试进行情绪分析,以便在数据框中评分很多评论。我有一个消极的词语语料库,一个积极的词语。我想为每个正面词添加1,并为注释中的每个负面词删除1。我的代码:
text['counts'] = 0
for i in text.Reviews:
if i in p:
text['counts'] += 1
elif i in n:
text['counts'] +=-1
我希望新列text.counts能够为每条评论提供评论的分数,但到目前为止,我只是设法让每一行都显示总计数(就好像我的数据框是一个大评论。)< / p>
谢谢!
答案 0 :(得分:0)
此处,您可以为每个评论指定一个特定计数,而不是全局计数。 :) 我复制了if语句,因为我假设你不想在每次迭代时检查,而是检查一些不太重要的东西,因此,更多的内存效率。 :d
text['commentsCount'] = {}
for i in text.Reviews:
# If review is positive
if i in p:
# If comment_id key hasn't been added yet...
if comment_id in text['commentsCount']:
text['commentsCount'][comment_id] = 0
text['commentsCount'][comment_id] += 1
elif i in n:
# if comment_id key hasn't been added yet...
if comment_id in text['commentsCount']:
text['commentsCount'][comment_id] = 0
text['commentsCount'][comment_id] -= 1
答案 1 :(得分:0)
这是你在找什么?
In [28]: text = pd.DataFrame( ['good and not bad', 'it is a terrible bad product', 'excellent product'], columns = ['reviews'])
In [29]: text
Out[29]:
reviews
0 good and not bad
1 it is a terrible bad product
2 excellent product
In [30]: n = set('bad worse terrible worse bad baddest'.split())
In [31]: p = set('good better excellent good best bestest good'.split())
In [32]: text['count'] = text['reviews'].apply(lambda review: sum(0 + ((word in p) and 1) + ((word in n) and -1) for word in review.split()))
In [33]: text
Out[33]:
reviews count
0 good and not bad 0
1 it is a terrible bad product -2
2 excellent product 1
答案 2 :(得分:0)
from collections import Counter
import pandas as pd
from nltk import word_tokenize
positive_words = set(['good', 'awesome', 'excellent'])
negative_words = set(['bad', 'terrible'])
df = pd.DataFrame( ['good and not bad', 'it is a terrible bad product', 'excellent product'], columns = ['Reviews'])
df['Tokenized'] = df['Reviews'].apply(str.lower).apply(word_tokenize)
df['WordCount'] = df['Tokenized'].apply(lambda x: Counter(x))
df['Positive'] = df['WordCount'].apply(lambda x: sum(v for k,v in x.items() if k in positive_words))
df['Negative'] = df['WordCount'].apply(lambda x: sum(v for k,v in x.items() if k in negative_words))
然后:
>>> df['Sentiment'] = df['Positive'] - df['Negative']
>>> df[['Reviews', 'Sentiment']]
Reviews Sentiment
0 good and not bad 0
1 it is a terrible bad product -2
2 excellent product 1
以上答案循环两次,这是另一种选择:
from collections import Counter
import pandas as pd
from nltk import word_tokenize
positive_words = set(['good', 'awesome', 'excellent'])
negative_words = set(['bad', 'terrible'])
df = pd.DataFrame( ['good and not bad', 'it is a terrible bad product', 'excellent product'], columns = ['Reviews'])
df['Tokenized'] = df['Reviews'].apply(str.lower).apply(word_tokenize)
df['WordCount'] = df['Tokenized'].apply(lambda x: Counter(x))
df['Sentiment'] = df['WordCount'].apply(lambda x: sum(v if k in positive_words else -v if k in negative_words else 0 for k,v in x.items()))