Question

我目前有一个包含标记化推文的pandas数据框。

我需要能够浏览每条推文并确定它是正面的还是负面的，这样我就可以添加一个包含正面或负面单词的后续专栏。

示例数据：

tokenized_tweets =  ['football, was, good, we, played, well' , 'We, were, unlucky, today, bad, luck' , 'terrible, performance, bad, game']

我需要能够通过tokenized_tweets部分运行一个循环，确定它是正还是负。

对于示例的情况，正面和负面的词语如下：

Positive_words = ['good', 'great'] 
Negative_words = ['terrible, 'bad']

所需的输出是一个数据名称，其中包含推文，每条推文包含多少正面字母，每条推文包含多少负字母以及推文是正面，负面还是中性。

根据推文是否有更多正面或负面流行语，需要制定积极的消极和中立

期望的输出：

Tokenized tweet                    positive words       negative words         overall 
`football, was, good, we, played, well         1                0            positive` 

We, were, unlucky, today, bad, luck            0                1            negative
terrible, performance, bad, game               0                2            negative

Answer 1

import pandas as pd
import numpy as np

df = pd.DataFrame({'tokenized_tweets': ['football, was, good, we, played, well', 'We, were, unlucky, today, bad, luck','terrible, performance, bad, game']})

Positive_words = ['good', 'great'] 
Negative_words = ['terrible','bad']

df['positive words'] = df['tokenized_tweets'].str.count('|'.join(Positive_words))
df['negative words'] = df['tokenized_tweets'].str.count('|'.join(Negative_words))

conditions = [
(df['positive words'] > df['negative words']),
(df['negative words'] > df['positive words']),
(df['negative words'] == df['positive words'])
]

choices = [
'positive',
'negative',
'neutral'
]

df['overall'] = np.select(conditions, choices, default = '')

df

OUT：

tokenized_tweets                      positive words   negative words   overall
0   football, was, good, we, played, well   1               0        positive
1   We, were, unlucky, today, bad, luck     0               1        negative
2   terrible, performance, bad, game        0               2        negative

如何通过数据框并将文本分类为正面还是负面？

1 个答案: