Question

我需要能够在R中的pairwise_cor函数的结果表中显示其他列。

我的代码基于“使用R进行文本挖掘”中的pairwise_cor示例，它们非常有用。我遇到的问题是我想将标识字段带入表中。当前，当我运行pairwise_cor函数时，它将返回一个包含“ item1”，“ item2”和“ correlation”的三列表。使用“使用R进行文本挖掘”一书中包含的示例，我希望能够将“ Book”和“ Section”列带入pairwise_cor输出数据帧。这样，我可以显示“ item1”和“ item2”的组合来自何处。

library(dplyr)
library(tidyr)
library(tidytext)
library(ggplot2)
library(igraph)
library(ggraph)
library(widyr)

austen_section_words <- austen_books() %>%
filter(book == "Pride & Prejudice") %>%
mutate(section = row_number() %/% 10) %>%
filter(section > 0) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word)

austen_section_words


# A tibble: 37,240 x 3
   book              section word        
   <fct>               <dbl> <chr>       
1 Pride & Prejudice       1 truth       
2 Pride & Prejudice       1 universally 
3 Pride & Prejudice       1 acknowledged
4 Pride & Prejudice       1 single      
5 Pride & Prejudice       1 possession  
6 Pride & Prejudice       1 fortune     
7 Pride & Prejudice       1 wife        
8 Pride & Prejudice       1 feelings    
9 Pride & Prejudice       1 views       
10 Pride & Prejudice       1 entering    
# ... with 37,230 more rows


word_cors <- austen_section_words %>%
  group_by(word) %>%
  filter(n() >= 20) %>%
  pairwise_cor(word, section, sort = TRUE)

word_cors


# A tibble: 154,842 x 3
item1     item2     correlation
<chr>     <chr>           <dbl>
1 bourgh    de              0.951
2 de        bourgh          0.951
3 pounds    thousand        0.701
4 thousand  pounds          0.701
5 william   sir             0.664
6 sir       william         0.664
7 catherine lady            0.663
8 lady      catherine       0.663
9 forster   colonel         0.622
10 colonel   forster         0.622
# ... with 154,832 more rows

我想在关联表中包括“ book”和“ section”列。

如何将书和节添加到pairwise_cor结果？

0 个答案: