我有一个简单的代码来执行文本分析。在创建DTM之前,我正在应用stemCompletion。然而,这是我不理解的东西,不管我做错了,或者这是它表现的唯一方式。
我已经参考了rmy help的这个链接:text-mining-with-the-tm-package-word-stemming
我在这里看到的问题是,在阻止之后,我的DTm缩小并且根本不返回标记(返回'content''meta')
我的代码和输出:
texts <- c("i am member of the XYZ association",
"apply for our open associate position",
"xyz memorial lecture takes place on wednesday",
"vote for the most popular lecturer")
myCorpus <- Corpus(VectorSource(texts))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
myCorpus <- tm_map(myCorpus, removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeURL)) #??
myCorpusCopy <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
for (i in 1:4) {
cat(paste("[[", i, "]] ", sep = ""))
writeLines(as.character(myCorpus[[i]]))
}
Output:
[[1]] i am member of the xyz associ
[[2]] appli for our open associ posit
[[3]] xyz memori lectur take place on wednesday
[[4]] vote for the most popular lectur
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
for (i in 1:4) {
cat(paste("[[", i, "]] ", sep = ""))
writeLines(as.character(myCorpus[[i]]))
}
Output:
[[1]] content
meta
[[2]] content
meta
[[3]] content
meta
[[4]] content
meta
myCorpus <- tm_map(myCorpus, PlainTextDocument)
dtm <- DocumentTermMatrix(myCorpus, control = list(weighting = weightTf))
dtm
inspect(dtm)
Output:
> inspect(dtm)
<<DocumentTermMatrix (documents: 4, terms: 2)>>
Non-/sparse entries: 8/0
Sparsity : 0%
Maximal term length: 7
Weighting : term frequency (tf)
Terms
Docs content meta
character(0) 1 1
character(0) 1 1
character(0) 1 1
character(0) 1 1
预期输出:成功运行词干(词干和词干完成)。我正在使用tm 0.6包
答案 0 :(得分:0)
您使用错误的功能。以下是它的工作原理:
texts <- c("i am member of the XYZ association",
"apply for our open associate position",
"xyz memorial lecture takes place on wednesday",
"vote for the most popular lecturer")
corp <- Corpus(VectorSource(texts))
tdm <- TermDocumentMatrix(corp, control = list(stemming = TRUE))
Terms(tdm)
# [1] "appli" "associ" "for" "lectur" "member" "memori" "most" "open"
# [9] "our" "place" "popular" "posit" "take" "the" "vote" "wednesday"
# [17] "xyz"
stemCompletion(Terms(tdm), corp)
# appli associ for lectur member memori most open
# "" "associate" "for" "lecture" "member" "memorial" "most" "open"
# our place popular posit take the vote wednesday
# "our" "place" "popular" "position" "takes" "the" "vote" "wednesday"
# xyz
# "xyz"