Question

我一直在研究TextBlob，以便为我编写的excel工作表上的文章列表计算情感评分（极性，主观性）。

以下是工作表的示例：

11/03/2004 04:03在三起炸弹袭击中至少有60人丧生   说在西班牙有史以来最严重的恐怖袭击中拥挤的马德里火车上   EFE新闻专线和其他媒体。红十字会说，至少有200人   受伤了``这是大屠杀，''社会党领袖何塞·路易斯说   谴责巴斯克恐怖组织ETA的Rodriguez Zapatero。

2005/07/07 04:41伦敦关闭了地铁系统，并撤离了所有   紧急服务后的加油站被要求爆炸   金融区周围。

2009年1月12日04:00今天的美国国际集团（AIG）   宣布已完成两项先前宣布的交易   与纽约联邦储备银行（FRBNY）   AIG欠FRBNY的债务为250亿美元，以换取FRBNY的债务   收购某些新成立的优先股权益   子公司。

2013年8月22日11:38纳斯达克因计算机关闭3小时   问题

我已经能够以最简单的方式使用textblob，方法是这样单独进行每一行：

analysis = TextBlob("NASDAQ shuts down for 3 hours due to a computer problem")
print(analysis.sentiment)

我想要导入包含日期和时间以及两列中的文章的excel文件，然后继续遍历每一行以计算极性和主观性得分并将其保存在文件中。

我已尝试通过以下方式在Thomson Reuters News Analytics上修改代码：

import pandas as pd
import numpy as np
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

df['Polarity'] = np.nan
df['Subjectivity'] = np.nan

pd.options.mode.chained_assignment = None

for idx, articles in enumerate(df['articles'].values):  # for each row in our df dataframe
    sentA = TextBlob("articles")  # pass the text only article to TextBlob to analyze
    df['Polarity'].iloc[idx] = sentA.sentiment.polarity  # write sentiment polarity back to df
    df['Subjectivity'].iloc[idx] = sentA.sentiment.subjectivity  # write sentiment subjectivity score back to df
df.head()

df.to_csv("out.csv", index=False)

虽然代码无法正常工作...我没有得到任何分数。

关于如何做到这一点的任何建议？

我是Python的新手（我正在使用Pycharm）。我主要在Stata和Matlab上进行编码。

请帮助！

Answer 1

您应该将逻辑移到一个函数中，然后使用pd.Series.map()将该函数应用于DataFrame的每一行。与手动循环相比，使用.map()或.apply()更快，更清洁。

import pandas as pd
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

# function to extract polarity and subjectivity from text
def process_text(text):
    blob = TextBlob(text)
    return blob.sentiemnt.polarity, blob.sentiment.subjectivity

# apply to each row of the 'articles' Series using the pd.Series.map method
df["polarity"], df["sentiment"] = zip(*df.articles.map(process_text))

df.head()

df.to_csv("out.csv", index=False)

免责声明：我尚未对此进行测试。

Answer 2

感谢您伸出手。

我实际上让代码工作了一段时间。

外观如下：

import pandas as pd
import numpy as np
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

df['Polarity'] = np.nan
df['Subjectivity'] = np.nan

pd.options.mode.chained_assignment = None

for idx, articles in enumerate(df['articles'].values):  # for each row in our df dataFrame
        ***if articles:***
            sentA = TextBlob(articles) # pass the text only article to TextBlob to analyse
            df['Polarity'].iloc[idx] = sentA.sentiment.polarity # write sentiment polarity back to df
            df['Subjectivity'].iloc[idx] = sentA.sentiment.subjectivity # write sentiment subjectivity score back to df

df.head()

df.to_csv("Sentiment_Scores.csv", index=False)

因此，我基本上错过了 if文章 位，该位最终将遍历每篇文章以检索得分。

我非常感谢您与我联系。

非常感谢！

问候帕尔瓦什

TextBlob-在文章上循环以计算极性和主观得分

2 个答案: