应用加权词典对情绪分析进行数据框架分析

时间:2018-03-13 11:13:06

标签: r sentiment-analysis weighted

大家早上好,我有一个与此类似的数据集:

> dataset2
   num article_num paragraph_num                                         paragraph       date  X
1    1           1             1                 Buy this stock, it's on the rise. 20/12/2017 NA
2    2           1             2   People think it's going to be a waste of money. 20/12/2017 NA
3    3           1             3                        Don't listen to them, buy. 20/12/2017 NA
4    4           2             1              Things are not going well down here. 21/12/2017 NA
5    5           2             2                   I'd suggest to sell this stock. 21/12/2017 NA
6    6           2             3                 Short paragraphs, long positions. 21/12/2017 NA
7    7           2             4                       In conclusion, I need help. 21/12/2017 NA
8    8           3             1  Bad results earlier today for a wonderful stock. 23/12/2017 NA
9    9           3             2                 But it is coming back strong now. 23/12/2017 NA
10  10           4             1               Well, it's Christmas time, be good. 25/12/2017 NA
11  11           4             2           Buy this stock, you will not regret it. 25/12/2017 NA
12  12           4             3                           Oh, and thank me later. 25/12/2017 NA
13  13           5             1 Coming back stronger than ever with the new year. 02/01/2018 NA
14  14           5             2                 Sell this stock, I'm warning you. 02/01/2018 NA
15  15           5             3                Buy this one though, very bullish. 02/01/2018 NA

我有一个包含“情绪权重”字样的.csv文件,这是一个示例:

> lexicon
                     keyword         sw
1                         up  0.2332850
2                      short -0.5811601
3                       down -0.4627161
4                        buy  0.2443343
5                       good  0.2396273
6                       long  0.2972727
7                       sell -0.4274253
8                       news  0.2599005
9                   emojipos  0.3257571
10                      nice  0.4140183

我想要做的是将词典应用于我所拥有的文章(或更好,段落)的情绪分析...我想使用这个特定的词典因为它是特定于字段的,所以它会更好对于我的项目,如果我可以使用它而不是其他“通用”词典。

提前致谢。

更新: 我实际上使用了tidytext并且它有效,我已经获得了这个(在原始文本上,而不是我提供的示例):

# A tibble: 1,891 x 6
 num article_num paragraph_num date       word         score
   <int>       <int>         <int> <fct>      <chr>        <dbl>
 1     1           1             1 01/12/2017 summary      0.284
 2     1           1             1 01/12/2017 visa         0.741
 3     1           1             1 01/12/2017 strong       0.451
 4     1           1             1 01/12/2017 network      0.375
 5     1           1             1 01/12/2017 top         -0.323
 6     1           1             1 01/12/2017 grow         0.430
 7     1           1             1 01/12/2017 pe          -0.234
 8     2           1             2 01/12/2017 visa         0.741
 9     2           1             2 01/12/2017 date         0.274
10     2           1             2 01/12/2017 appreciated  0.620
# ... with 1,881 more rows

如何让R计算每个paragraph_num和每个article_num的总分?

0 个答案:

没有答案