根据for循环的结果构建数据框

时间:2020-04-02 22:45:13

标签: python dataframe

如何将查询结果放入数据框。我已经尝试了许多示例修复程序来解决此问题,但是它们都没有给我我要查找的所有100行。谢谢。

from nltk.sentiment.vader import SentimentIntensityAnalyzer

scores = df['clean_tweet']

sid = SentimentIntensityAnalyzer()

for score in scores:
    print(score)
    ss = sid.polarity_scores(score)
    for k in sorted(ss):
        print('{0}: {1}, '.format(k, ss[k]), end='')
        print()

 # decide sentiment as positive, negative and neutral 
    if ss['compound'] > 0 : 
        print("Pos") 

    elif ss['compound'] < 0 : 
        print("Neg") 

    else : 
        print("Neutral") 

2 个答案:

答案 0 :(得分:0)

您需要一个新的专栏吗? 如果是,请尝试此操作。

def sentiment_calculation(ss):
    score = ss['compound']
    if score > 0:
        return "Pos"
    elif score < 0:
        return "Neg"
    else:
        return "Neutral"

df = pd.DataFrame({"ID":list(range(5)),
                   "clean_tweet":['I hate this movie', 'I love him', 'disgusting', 'love yo', 'Beautiful']})

sid = SentimentIntensityAnalyzer()

polarity_scores = df['clean_tweet'].apply(sid.polarity_scores)
df['sentiment'] = polarity_scores.apply(sentiment_calculation)

结果如下。

enter image description here

您只需将df更改为您的df。

答案 1 :(得分:0)

假设您的示例中的df已经包含了您想在其'clean_tweet'列中进行分析的所有文本,就像这样简单:

from pandas import DataFrame
from nltk.sentiment.vader import SentimentIntensityAnalyzer

df = DataFrame([
    {'clean_tweet': 'this is a bad example'},
    {'clean_tweet': 'this is a really good one'}
])

sia = SentimentIntensityAnalyzer()

analyses = [
    {'clean_tweet': tweet, **sia.polarity_scores(tweet)}
    for tweet in df['clean_tweet']
]
result = DataFrame([
    {**analysis, 'score': 'Pos' if analysis['compound'] > 0
     else 'Neg' if analysis['compound'] < 0 else 'Neutral'}
    for analysis in analyses
])

print(result)

输出:

                 clean_tweet    neg    neu    pos  compound score
0      this is a bad example  0.538  0.462  0.000   -0.5423   Neg
1  this is a really good one  0.000  0.556  0.444    0.4927   Pos