我正在分析用户评论中出现的单词的成对相关性,并以相关性网络图的形式绘制它们。
我的示例数据如下:
review_corwords
Label Rating word
1 1 1 connect
1.1 1 1 gps
1.2 1 1 app
1.3 1 1 connect
1.4 1 1 gps
1.5 1 1 matter
1.6 1 1 long
1.7 1 1 gps
1.8 1 1 set
1.9 1 1 high
1.10 1 1 accuracy
1.11 1 1 setting
1.12 1 1 appear
1.13 1 1 set
1.14 1 1 app
1.15 1 1 useless
1.16 1 1 cant
1.17 1 1 track
1.18 1 1 workout
2 1 5 wish
2.1 1 5 would
2.2 1 5 interest
2.3 1 5 google
2.4 1 5 provide
2.5 1 5 weekly
2.6 1 5 monthly
2.7 1 5 summary
3 1 1 useless
然后我执行此操作:
library(widyr)
# count words co-occuring within a label
word_pairs <- review_corwords %>%
pairwise_count(word, Label,sort = TRUE)
其输出如下:
# A tibble: 16,333,722 x 3
item1 item2 n
<chr> <chr> <dbl>
1 gps connect 1
2 app connect 1
3 matter connect 1
4 long connect 1
5 set connect 1
但是,当我尝试对其进行相关分析时,会得到以下信息:
word_cors <- review_corwords %>%
group_by(word) %>%
pairwise_cor(word, Label, sort = TRUE)
# A tibble: 16,333,722 x 3
item1 item2 correlation
<chr> <chr> <dbl>
1 gps connect NaN
2 app connect NaN
3 matter connect NaN
4 long connect NaN
5 set connect NaN
6 high connect NaN
我需要找到正确的词对相关值,请帮忙。