大家早上好,我有一个与此类似的数据集:
> dataset2
num article_num paragraph_num paragraph date X
1 1 1 1 Buy this stock, it's on the rise. 20/12/2017 NA
2 2 1 2 People think it's going to be a waste of money. 20/12/2017 NA
3 3 1 3 Don't listen to them, buy. 20/12/2017 NA
4 4 2 1 Things are not going well down here. 21/12/2017 NA
5 5 2 2 I'd suggest to sell this stock. 21/12/2017 NA
6 6 2 3 Short paragraphs, long positions. 21/12/2017 NA
7 7 2 4 In conclusion, I need help. 21/12/2017 NA
8 8 3 1 Bad results earlier today for a wonderful stock. 23/12/2017 NA
9 9 3 2 But it is coming back strong now. 23/12/2017 NA
10 10 4 1 Well, it's Christmas time, be good. 25/12/2017 NA
11 11 4 2 Buy this stock, you will not regret it. 25/12/2017 NA
12 12 4 3 Oh, and thank me later. 25/12/2017 NA
13 13 5 1 Coming back stronger than ever with the new year. 02/01/2018 NA
14 14 5 2 Sell this stock, I'm warning you. 02/01/2018 NA
15 15 5 3 Buy this one though, very bullish. 02/01/2018 NA
我有一个包含“情绪权重”字样的.csv文件,这是一个示例:
> lexicon
keyword sw
1 up 0.2332850
2 short -0.5811601
3 down -0.4627161
4 buy 0.2443343
5 good 0.2396273
6 long 0.2972727
7 sell -0.4274253
8 news 0.2599005
9 emojipos 0.3257571
10 nice 0.4140183
我想要做的是将词典应用于我所拥有的文章(或更好,段落)的情绪分析...我想使用这个特定的词典因为它是特定于字段的,所以它会更好对于我的项目,如果我可以使用它而不是其他“通用”词典。
提前致谢。
更新: 我实际上使用了tidytext并且它有效,我已经获得了这个(在原始文本上,而不是我提供的示例):
# A tibble: 1,891 x 6
num article_num paragraph_num date word score
<int> <int> <int> <fct> <chr> <dbl>
1 1 1 1 01/12/2017 summary 0.284
2 1 1 1 01/12/2017 visa 0.741
3 1 1 1 01/12/2017 strong 0.451
4 1 1 1 01/12/2017 network 0.375
5 1 1 1 01/12/2017 top -0.323
6 1 1 1 01/12/2017 grow 0.430
7 1 1 1 01/12/2017 pe -0.234
8 2 1 2 01/12/2017 visa 0.741
9 2 1 2 01/12/2017 date 0.274
10 2 1 2 01/12/2017 appreciated 0.620
# ... with 1,881 more rows
如何让R计算每个paragraph_num和每个article_num的总分?