使用tidytext和扫帚,但没有找到LDA_VEM整洁

时间:2018-02-13 11:37:50

标签: r broom tidytext

整洁的文本书中有关于主题模型更整洁的例子:

library(tidyverse)
library(tidytext)
library(topicmodels)
library(broom)

year_word_counts <- tibble(year = c("2007", "2008", "2009"),
+                            word = c("dog", "cat", "chicken"),
+                            n = c(1753L, 1157L, 1057L))

animal_dtm <- cast_dtm(data = year_word_counts, document = year, term = word, value = n)

animal_lda <- LDA(animal_dtm, k = 5, control = list( seed = 1234))

animal_lda <- tidy(animal_lda, matrix = "beta")

# Console output
Error in as.data.frame.default(x) : 
  cannot coerce class "structure("LDA_VEM", package = "topicmodels")" to a data.frame
In addition: Warning message:
In tidy.default(animal_lda, matrix = "beta") :
  No method for tidying an S3 object of class LDA_VEM , using as.data.frame

复制也出现here但在此例library(tidytext)中的错误  本。

以下列出了所有包的相应版本:

 packageVersion("tidyverse")
 ‘1.2.1’

 packageVersion("tidytext")
 ‘0.1.6’   

 packageVersion("topicmodels")
 ‘0.2.7’  

 packageVersion("broom")
 ‘0.4.3’

函数调用sessionInfo()的输出:

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] broom_0.4.3       tidytext_0.1.6    forcats_0.2.0     stringr_1.2.0     dplyr_0.7.4       purrr_0.2.4       readr_1.1.1       tidyr_0.8.0      
 [9] tibble_1.4.2      ggplot2_2.2.1     tidyverse_1.2.1   topicmodels_0.2-7

loaded via a namespace (and not attached):
 [1] modeltools_0.2-21 slam_0.1-42       NLP_0.1-11        reshape2_1.4.3    haven_1.1.1       lattice_0.20-35   colorspace_1.3-2  SnowballC_0.5.1  
 [9] stats4_3.4.3      yaml_2.1.16       rlang_0.1.6       pillar_1.1.0      foreign_0.8-69    glue_1.2.0        modelr_0.1.1      readxl_1.0.0     
[17] bindrcpp_0.2      bindr_0.1         plyr_1.8.4        munsell_0.4.3     gtable_0.2.0      cellranger_1.1.0  rvest_0.3.2       psych_1.7.8      
[25] tm_0.7-3          parallel_3.4.3    tokenizers_0.1.4  Rcpp_0.12.15      scales_0.5.0      jsonlite_1.5      mnormt_1.5-5      hms_0.4.1        
[33] stringi_1.1.6     grid_3.4.3        cli_1.0.0         tools_3.4.3       magrittr_1.5      lazyeval_0.2.1    janeaustenr_0.1.5 crayon_1.3.4     
[41] pkgconfig_2.0.1   Matrix_1.2-12     xml2_1.2.0        lubridate_1.7.2   assertthat_0.2.0  httr_1.3.1        rstudioapi_0.7    R6_2.2.2         
[49] nlme_3.1-131      compiler_3.4.3   

4 个答案:

答案 0 :(得分:5)

删除.Rhistory和.RData导致了正确的行为。

答案 1 :(得分:2)

哇,这对我来说是非常神秘的。我无法重现该错误。我安装到所有相同的版本/ etc,除了我在MacOS而不是Windows。我在Appveyor上的Windows上有tests for the LDA tidiers that run and pass,所以我希望这可以工作。

您所拥有的代码应该可以在不加载扫帚的情况下工作,以获得它的价值。

library(tidyverse)
library(tidytext)
library(topicmodels)

year_word_counts <- tibble(year = c("2007", "2008", "2009"),
                           word = c("dog", "cat", "chicken"),
                           n = c(1753L, 1157L, 1057L))

animal_dtm <- cast_dtm(data = year_word_counts, document = year, term = word, value = n)

animal_lda <- LDA(animal_dtm, k = 5, control = list( seed = 1234))

class(animal_lda)
#> [1] "LDA_VEM"
#> attr(,"package")
#> [1] "topicmodels"

tidy(animal_lda, matrix = "beta")
#> # A tibble: 15 x 3
#>    topic term                                                beta
#>    <int> <chr>                                              <dbl>
#>  1     1 dog     0.0000000000000000000000000000000000000000000372
#>  2     2 dog     0.0000000000000000000000000000000000000000000372
#>  3     3 dog     0.0000000000000000000000000000000000000000000372
#>  4     4 dog     1.00                                            
#>  5     5 dog     0.0000000000000000000000000000000000000000000372
#>  6     1 cat     0.0000000000000000000000000000000000000000000372
#>  7     2 cat     0.0000000000000000000000000000000000000000000372
#>  8     3 cat     0.0000000000000000000000000000000000000000000372
#>  9     4 cat     0.0000000000000000000000000000000000000000000372
#> 10     5 cat     1.00                                            
#> 11     1 chicken 0.0000000000000000000000000000000000000000000372
#> 12     2 chicken 0.0000000000000000000000000000000000000000000372
#> 13     3 chicken 1.00                                            
#> 14     4 chicken 0.0000000000000000000000000000000000000000000372
#> 15     5 chicken 0.0000000000000000000000000000000000000000000372

reprex package(v0.2.0)创建于2018-02-14。

如果您同时加载library(methods)会怎样?

答案 2 :(得分:0)

加载已保存的LDA时,我遇到了同样的问题。 最后,由于没有明显的原因,当我重新启动R会话时,我又重新工作了。

答案 3 :(得分:0)

在朱莉娅·席尔格(Julia Silge)提供的非常有用的答案中:

我也相信加载.Rdata和topicmodels包之间的交互是这里的罪魁祸首。但是您仍然可以使用已保存的工作空间:

我能够通过重新启动RStudio,加载topicmodels软件包并然后加载.Rdata来解决此问题。按此顺序完成后,错误消息消失。首先加载数据,然后程序包不起作用。

关于工作区的另一句话:对于LDA,将它们与RScript一起使用确实是我想出的有效工作的唯一方法。根据参数和语料库的大小,拟合LDA模型可能需要几个小时。至关重要的是,必须保存模型的拟合度,然后进行进一步的分析。