enc2utf8(x)出错:参数不是字符向量

时间:2014-12-15 05:54:35

标签: r text-mining

Error in enc2utf8(x) : argumemt is not a character vector是我尝试在R 3.1.2中运行以下代码时得到的错误。如果我在这里遗失了什么,有谁能帮助我理解?

使用的操作系统是Windows

#Text Cleaning: tm Code
  clean<-function(text){
  library(NLP)
  library(tm)
  sample<- Corpus(VectorSource(text),readerControl=list(language="english"))
  sample<- tm_map(sample, function(x) iconv(enc2utf8(x), sub = "bytes"))
  sample<-tm_map(sample,removePunctuation)
  sample <- tm_map(sample, stripWhitespace)
  sample<-tm_map(sample,removeNumbers)
  sample<-tm_map(sample,removeWords,stopwords('smart'))
  sample <- tm_map(sample, stripWhitespace)
  sample <- tm_map(sample, stripWhitespace)
  dtm <- DocumentTermt(sample[1:3])Matrix(sample)
  return(list(sample,dtm))
  }
 fileName <- 'input.txt'
 test = readChar(fileName, file.info(fileName)$size)
 clean (test)

2 个答案:

答案 0 :(得分:3)

您必须参考语料库的content,即sample$content中的字符向量:

tm_map(sample, function(x) iconv(enc2utf8(x$content), sub = "bytes"))

在此,我将enc2utf8(x)替换为enc2utf8(x$content)

答案 1 :(得分:0)

您好,将硬币改成2行以下可能会解决您的问题

sample <-VCorpus(VectorSource(text),readerControl = list(language =“ english”))   sample <-tm_map(sample,content_transformer(function(x)iconv(enc2utf8(x),sub =“ bytes”)))