我目前有一个包含标记化推文的pandas数据框。
我需要能够浏览每条推文并确定它是正面的还是负面的,这样我就可以添加一个包含正面或负面单词的后续专栏。
示例数据:
tokenized_tweets = ['football, was, good, we, played, well' , 'We, were, unlucky, today, bad, luck' , 'terrible, performance, bad, game']
我需要能够通过tokenized_tweets部分运行一个循环,确定它是正还是负。
对于示例的情况,正面和负面的词语如下:
Positive_words = ['good', 'great']
Negative_words = ['terrible, 'bad']
所需的输出是一个数据名称,其中包含推文,每条推文包含多少正面字母,每条推文包含多少负字母以及推文是正面,负面还是中性。
根据推文是否有更多正面或负面流行语,需要制定积极的消极和中立
期望的输出:
Tokenized tweet positive words negative words overall
`football, was, good, we, played, well 1 0 positive`
We, were, unlucky, today, bad, luck 0 1 negative
terrible, performance, bad, game 0 2 negative
答案 0 :(得分:0)
import pandas as pd
import numpy as np
df = pd.DataFrame({'tokenized_tweets': ['football, was, good, we, played, well', 'We, were, unlucky, today, bad, luck','terrible, performance, bad, game']})
Positive_words = ['good', 'great']
Negative_words = ['terrible','bad']
df['positive words'] = df['tokenized_tweets'].str.count('|'.join(Positive_words))
df['negative words'] = df['tokenized_tweets'].str.count('|'.join(Negative_words))
conditions = [
(df['positive words'] > df['negative words']),
(df['negative words'] > df['positive words']),
(df['negative words'] == df['positive words'])
]
choices = [
'positive',
'negative',
'neutral'
]
df['overall'] = np.select(conditions, choices, default = '')
df
OUT:
tokenized_tweets positive words negative words overall
0 football, was, good, we, played, well 1 0 positive
1 We, were, unlucky, today, bad, luck 0 1 negative
2 terrible, performance, bad, game 0 2 negative