我正在为一些简单的数据集实现LDA,我能够进行主题建模,但问题是当我尝试根据他们的主题组织前6个术语时,我得到一些数值(可能是他们的索引) )
# docs is the dataset formatted and cleaned properly
dtm<- TermDocumentMatrix(docs, control = list(removePunctuation = TRUE, stopwords=TRUE))
ldaOut<-LDA(dtm,k,method="Gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin))
# 6 top terms in each topic
ldaOut.terms<-as.matrix(terms(ldaOut,6))
write.csv(ldaOut.terms,file=paste("LDAGibbs",k,"TopicsToTerms.csv"))
TopicsToTerms文件生成为,
Topic 1 Topic 2 Topic 3
1 1 5 3
2 2 1 4
3 3 2 1
4 4 3 2
5 5 4 5
虽然我想要条款(每个主题的顶部单词)在表格中,如下所示 -
Topic 1 Topic 2 Topic 3
1 Hat Cat Food
答案 0 :(得分:1)
您只需要一行代码来解决问题:
> text = read.csv("~/Desktop/your_data.csv") #your initial dataset
> docs = Corpus(VectorSource(text)) #converting to corpus
> docs = tm_map(docs, content_transformer(tolower)) #cleaning
> ... #cleaning
> dtm = DocumentTermMatrix(docs) #creating a document term matrix
> rownames(dtm) = text
添加最后一行后,您可以继续使用其余代码,然后您将获得条款,而不是其索引。希望有所帮助。