tm
会抛出错误
library(tm)
data(crude)
#control parameters
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
stopWords = stopwords("english"),
stemming = TRUE, # false for sentiment
wordLengths = c(3, "inf"))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
错误:
simple_triplet_matrix中的错误(i = i,j = j,v = as.numeric(v),nrow = length(allTerms),: &#39; i,j,v&#39;不同的长度 另外:警告信息: 1:在mclapply(unname(content(x)),termFreq,control): 所有计划的核心在用户代码中遇到错误 2:在simple_triplet_matrix中(i = i,j = j,v = as.numeric(v),nrow = length(allTerms),: 强制引入的NA
我做错了什么? 也:
我正在使用这些教程:
是否有更好/更近期的演练?
答案 0 :(得分:0)
您可能会考虑对代码进行一些更改,尤其是removeStopWords和创建语料库。以下对我有用:
library(tm)
data("crude")
#control parameters
dtm.control <- list(
tolower = TRUE,
removePunctuation = TRUE,
removeNumbers = TRUE,
removestopWords = TRUE,
stemming = TRUE, # false for sentiment
wordLengths = c(3, "inf"))
corp <- Corpus(VectorSource(crude))
dtm <- DocumentTermMatrix(corp, control = dtm.control)
> inspect(dtm)
<<DocumentTermMatrix (documents: 20, terms: 848)>>
Non-/sparse entries: 1877/15083
Sparsity : 89%
Maximal term length: 16
Weighting : term frequency (tf)