获取R中LDA中每个主题的主要术语

时间:2016-02-08 07:08:51

标签: r

我正在为一些简单的数据集实现LDA,我能够进行主题建模,但问题是当我尝试根据他们的主题组织前6个术语时,我得到一些数值(可能是他们的索引) )

# docs is the dataset formatted and cleaned properly    
dtm<- TermDocumentMatrix(docs, control = list(removePunctuation = TRUE, stopwords=TRUE))
ldaOut<-LDA(dtm,k,method="Gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin))

# 6 top terms in each topic 
ldaOut.terms<-as.matrix(terms(ldaOut,6))    

write.csv(ldaOut.terms,file=paste("LDAGibbs",k,"TopicsToTerms.csv"))    

TopicsToTerms文件生成为,

    Topic 1 Topic 2 Topic 3 
1   1        5       3  
2   2        1       4  
3   3        2       1  
4   4        3       2  
5   5        4       5  

虽然我想要条款(每个主题的顶部单词)在表格中,如下所示 -

    Topic 1   Topic 2     Topic 3   
1     Hat       Cat        Food 

1 个答案:

答案 0 :(得分:1)

您只需要一行代码来解决问题:

> text = read.csv("~/Desktop/your_data.csv") #your initial dataset
> docs = Corpus(VectorSource(text)) #converting to corpus
> docs = tm_map(docs, content_transformer(tolower)) #cleaning
> ... #cleaning
> dtm = DocumentTermMatrix(docs) #creating a document term matrix
> rownames(dtm) = text

添加最后一行后,您可以继续使用其余代码,然后您将获得条款,而不是其索引。希望有所帮助。