我需要能够在R中的pairwise_cor函数的结果表中显示其他列。
我的代码基于“使用R进行文本挖掘”中的pairwise_cor示例,它们非常有用。我遇到的问题是我想将标识字段带入表中。当前,当我运行pairwise_cor函数时,它将返回一个包含“ item1”,“ item2”和“ correlation”的三列表。使用“使用R进行文本挖掘”一书中包含的示例,我希望能够将“ Book”和“ Section”列带入pairwise_cor输出数据帧。这样,我可以显示“ item1”和“ item2”的组合来自何处。
library(dplyr)
library(tidyr)
library(tidytext)
library(ggplot2)
library(igraph)
library(ggraph)
library(widyr)
austen_section_words <- austen_books() %>%
filter(book == "Pride & Prejudice") %>%
mutate(section = row_number() %/% 10) %>%
filter(section > 0) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word)
austen_section_words
# A tibble: 37,240 x 3
book section word
<fct> <dbl> <chr>
1 Pride & Prejudice 1 truth
2 Pride & Prejudice 1 universally
3 Pride & Prejudice 1 acknowledged
4 Pride & Prejudice 1 single
5 Pride & Prejudice 1 possession
6 Pride & Prejudice 1 fortune
7 Pride & Prejudice 1 wife
8 Pride & Prejudice 1 feelings
9 Pride & Prejudice 1 views
10 Pride & Prejudice 1 entering
# ... with 37,230 more rows
word_cors <- austen_section_words %>%
group_by(word) %>%
filter(n() >= 20) %>%
pairwise_cor(word, section, sort = TRUE)
word_cors
# A tibble: 154,842 x 3
item1 item2 correlation
<chr> <chr> <dbl>
1 bourgh de 0.951
2 de bourgh 0.951
3 pounds thousand 0.701
4 thousand pounds 0.701
5 william sir 0.664
6 sir william 0.664
7 catherine lady 0.663
8 lady catherine 0.663
9 forster colonel 0.622
10 colonel forster 0.622
# ... with 154,832 more rows
我想在关联表中包括“ book”和“ section”列。