分析财务报告的情绪-Python

时间:2019-10-01 06:05:22

标签: python nlp nltk sentiment-analysis

我一直在尝试分析财务报表的情绪。在将财务词汇添加到词典后,我正在使用nltk.vader_lexicon模块。我正在使用这个 Loughran-McDonald词来增加here的财务用语。

添加单词的代码如下:

import csv
import pandas as pd

# stock market lexicon
stock_lex = pd.read_csv('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data/stock_lex.csv')
stock_lex['sentiment'] = (stock_lex['Aff_Score'] + stock_lex['Neg_Score'])/2
stock_lex = dict(zip(stock_lex.Item, stock_lex.sentiment))
stock_lex = {k:v for k,v in stock_lex.items() if len(k.split(' '))==1}
stock_lex_scaled = {}
for k, v in stock_lex.items():
    if v > 0:
        stock_lex_scaled[k] = v / max(stock_lex.values()) * 4
    else:
        stock_lex_scaled[k] = v / min(stock_lex.values()) * -4

# Loughran and McDonald
positive = []
with open('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data//lm_positive.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        positive.append(row[0].strip())

negative = []
with open('C:/Users/ddutta070819/Downloads/EWS/StockSentimentTrading-master/lexicon_data//lm_negative.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        entry = row[0].strip().split(" ")
        if len(entry) > 1:
            negative.extend(entry)
        else:
            negative.append(entry[0])

final_lex = {}
final_lex.update({word:2.0 for word in positive})
final_lex.update({word:-2.0 for word in negative})
final_lex.update(stock_lex_scaled)
final_lex.update(sia.lexicon)
sia.lexicon = final_lex

尽管总体结果有所改善,但是该模型似乎无法理解这些数字。 例如:

sia.polarity_scores('Royal Dutch Shell plc announced earnings results for the second quarter ended June 30, 2019. \ For the second quarter, the company announced total revenue was USD 91,838 million compared to USD 99,268 million a year \ ago. Net income was USD 2,998 million compared to USD 6,024 million a year ago. Basic earnings per share was USD 0.37 \ compared to USD 0.72 a year ago. For the half year, total revenue was USD 177,499 million compared to USD 190,382 million\ a year ago. Net income was USD 8,999 million compared to USD 11,923 million a year ago. Basic earnings per share was \ USD 1.11 compared to USD 1.44 a year ago. Diluted earnings per share was USD 1.1 compared to USD 1.42 a year ago.')

  

-0.81

这是绝对正确的,但是即使我更改了数字:

sia.polarity_scores('Royal Dutch Shell plc announced earnings results for the second quarter ended June 30, 2019. \ For the second quarter, the company announced total revenue was USD 91,838 million compared to USD 69,268 million a year \ ago. Net income was USD 2,998 million compared to USD 1,024 million a year ago. Basic earnings per share was USD 0.37 \ compared to USD 0.17 a year ago. For the half year, total revenue was USD 177,499 million compared to USD 150,382 million\ a year ago. Net income was USD 8,999 million compared to USD 6,923 million a year ago. Basic earnings per share was \ USD 1.11 compared to USD 1.04 a year ago. Diluted earnings per share was USD 1.1 compared to USD 1.02 a year ago.')

  

-0.81

提供的情感评分仍为负。

有没有一种方法可以帮助模型根据所写文本的上下文来理解这些数字?

1 个答案:

答案 0 :(得分:0)

据我了解,您只是根据个人标记作为文本句子的组成部分来调整情绪估计,但这绝对不是正确的情绪分析方法。为了训练允许对文本进行分类的模型,标准方法将在神经网络中使用长短期记忆单元。您可以使用这些Loughran-McDonald词来将标记映射到该文件中列出的类别。如果您所有的文字都符合此原理图比较,则可以提取数字,计算变化(有义的或负的),然后使用该数字训练模型以更好地理解与数字的关系。这可能意味着您将更改比例映射到可以输入LSTM模型的单独评估类别中。