如何将LDA输出转换为R中的字主题矩阵?

时间:2017-02-15 22:51:46

标签: r text-mining lda topic-modeling

library(tm)
library(topicmodels)
lda_topicmodel <- model_LDA(dtm, k=20, control=list(seed=1234))

我使用R中的LDA函数执行了Latent Dirichlet Allocation。现在,LDA对象格式中有一个S4

如何将其转换为R?

中的单词主题矩阵和文档主题矩阵

不幸的是,类型&#39; S4&#39;不是子集。因此,我不得不求助于复制数据的子集以供使用。

Topic 1     Topic 2   Topic 3   Topic 4    Topic 5     Topic 6    Topic 7         Topic 8    Topic 9      Topic 10    
[1,] "flooding"  "beach"   "sets"    "flooding" "storm"     "fwy"      "storms"        "flooding" "socal"      "rain"      
[2,] "erosion"   "long"    "alltime" "just"     "flooding"  "due"      "thunderstorms" "via"      "major"      "california"
[3,] "cause"     "abc7"    "rain"    "almost"   "years"     "closures" "flash"         "public"   "throughout" "nearly"    
[4,] "emergency" "day"     "slides"  "hardcore" "mudslides" "avoid"    "continue"      "asks"     "abc7"       "southern"  
[5,] "highway"   "history" "last"    "spun"     "snow"      "latest"   "possible"      "call"     "streets"    "storms"  



Topic 11 Topic 12   Topic 13  Topic 14      Topic 15      Topic 16 Topic 17   Topic 18   Topic 19     Topic 20     
[1,] "abc7"   "abc7"     "like"    "widespread"  "widespread"  "across" "rainfall" "flooding" "flooding"   "vehicles"   
[2,] "beach"  "flooding" "closed"  "batters"     "biggest"     "can"    "record"   "region"   "storm"      "several"    
[3,] "long"   "stranded" "live"    "california"  "evacuations" "stay"   "breaks"   "reported" "california" "getting"    
[4,] "fwy"    "county"   "raining" "evacuations" "mudslides"   "home"   "long"     "corona"   "causes"     "floodwaters"
[5,] "710"    "san"      "blog"    "mudslides"   "years"       "wires"  "beach"    "across"   "related"    "stranded" 

图片包含每个主题中单词的子集:LDA word-topic 我希望将S4对象的内容写入csv文件,如word-topic矩阵,如下所示: Word-Topic Matrix

1 个答案:

答案 0 :(得分:1)

我正在使用R中的一些数据,因为我们无法重现您的数据。

# load the libraries
library(topicmodels)
library(tm)

# load the data we'll be using
data("AssociatedPress")

# estimate a LDA model using the VEM algorithm (default)
# I'll be using the number of k (number of topics) being 2
# just as a example
ap_lda <- LDA(AssociatedPress, 
              k = 2, 
              control = list(seed = 1234))

# get all the terms in a dataframe 
as.data.frame(terms(ap_lda, dim(ap_lda)[1]))

输出结果为:

  Topic 1    Topic 2
1 percent          i
2 million  president
3     new government
4    year     people
5 billion     soviet
6    last        new