Question

我真的没有进入文本挖掘，所以我在这里问我的问题。我有一些文本，我想分析文本的主题（标签）。所以我问自己最好的方法是什么。

首先，我准备了文本并删除了使用tm包的停用词：

library(tm)

sample2 = c('This text is about the wheather. Today the wheater is really bad.', 'The dog is barking very loud. That is annoying.')
myStopwords <- c(stopwords("english"), "today")

df <- do.call("rbind", lapply(sample2, as.data.frame))
colnames(df) = "texts"
corpus <- Corpus(VectorSource(df$texts))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, function(x) removeWords(x, myStopwords))
corpus <- tm_map(corpus, stemDocument, language = c("english"))

现在我创建了一个TermDocumentMatrix

td.mat <- TermDocumentMatrix(corpus, control=list(minWordLength = 1))

现在我想找到频繁的名词（没有动词或形容词）。文本1应该有“text”和“wheater”作为标签，而Text 2应该有“dog”。谁能告诉我怎么做？

R自动查找文本中的标签

0 个答案: