Error in enc2utf8(x) : argumemt is not a character vector
是我尝试在R 3.1.2中运行以下代码时得到的错误。如果我在这里遗失了什么,有谁能帮助我理解?
使用的操作系统是Windows
#Text Cleaning: tm Code
clean<-function(text){
library(NLP)
library(tm)
sample<- Corpus(VectorSource(text),readerControl=list(language="english"))
sample<- tm_map(sample, function(x) iconv(enc2utf8(x), sub = "bytes"))
sample<-tm_map(sample,removePunctuation)
sample <- tm_map(sample, stripWhitespace)
sample<-tm_map(sample,removeNumbers)
sample<-tm_map(sample,removeWords,stopwords('smart'))
sample <- tm_map(sample, stripWhitespace)
sample <- tm_map(sample, stripWhitespace)
dtm <- DocumentTermt(sample[1:3])Matrix(sample)
return(list(sample,dtm))
}
fileName <- 'input.txt'
test = readChar(fileName, file.info(fileName)$size)
clean (test)
答案 0 :(得分:3)
您必须参考语料库的content
,即sample$content
中的字符向量:
tm_map(sample, function(x) iconv(enc2utf8(x$content), sub = "bytes"))
在此,我将enc2utf8(x)
替换为enc2utf8(x$content)
。
答案 1 :(得分:0)
您好,将硬币改成2行以下可能会解决您的问题
sample <-VCorpus(VectorSource(text),readerControl = list(language =“ english”)) sample <-tm_map(sample,content_transformer(function(x)iconv(enc2utf8(x),sub =“ bytes”)))