我面临以下问题。我提取了推文,现在在一个csv文件中收集了大约500条推文。有了这些,我想使用LDA模型生成主题。到现在为止还挺好。我确实收到了主题,但现在我想知道哪些推文属于哪些主题。但我只是不知道该怎么做...... 我有一个csv文件,我编号每个推文。我这样认为我可以获得构建主题的相应推文,但命令“topics(lda)”不起作用。也许有人可以帮助我,请^^
This is the csv file I use (LDA_start). Only two columns (number, text)
这是我在LDA建模教程中找到的代码(我是R初学者)
library("SocialMediaLab")
library("topicmodels")
library("slam")
library("Rmpfr")
library("tm")
library("stringr")
myData = read.csv("LDA_start.csv", header = TRUE)
tweetCorpus <- VCorpus(VectorSource(myData))
myStopwords <- c(stopwords('english'))
tweetCorpus <- tm_map(tweetCorpus, removeWords, myStopwords)
dtmTopicModeling <- DocumentTermMatrix(tweetCorpus,control = list(stemming = TRUE, tolower = TRUE, removeNumbers = TRUE, removePunctuation = TRUE, wordLengths = c(3, 30)))
harmonicMean <- function(logLikelihoods, precision=2000L) {
llMed <- median(logLikelihoods)
as.double(llMed - log(mean(exp(-mpfr(logLikelihoods,
prec = precision) + llMed))))
}
burnin = 1000
iter = 1000
keep = 50
sequ <- seq(2, 100, 5)
fitted_many <- lapply(sequ, function(k) LDA(dtmTopicModeling, k = k, method = "Gibbs",control = list(burnin = burnin, iter = iter, keep = keep) ))
logLiks_many <- lapply(fitted_many, function(L) L@logLiks[-c(1:(burnin/keep))])
hm_many <- sapply(logLiks_many, function(h) harmonicMean(h))
k <- sequ[which.max(hm_many)]
seedNum <- 42
lda <- LDA(dtmTopicModeling, k = k, method = "Gibbs", control = list(burnin = burnin, iter = iter, keep = keep, seed=seedNum))
write.csv(terms(lda,50), "TopicModel.csv")
topics(lda)
1
1
是否有更简单的方法可以生成主题并找到制作主题的推文而不是我的推文?我真的很感激你的答案!