Question

Error in enc2utf8(x) : argumemt is not a character vector是我尝试在R 3.1.2中运行以下代码时得到的错误。如果我在这里遗失了什么，有谁能帮助我理解？

使用的操作系统是Windows

#Text Cleaning: tm Code
  clean<-function(text){
  library(NLP)
  library(tm)
  sample<- Corpus(VectorSource(text),readerControl=list(language="english"))
  sample<- tm_map(sample, function(x) iconv(enc2utf8(x), sub = "bytes"))
  sample<-tm_map(sample,removePunctuation)
  sample <- tm_map(sample, stripWhitespace)
  sample<-tm_map(sample,removeNumbers)
  sample<-tm_map(sample,removeWords,stopwords('smart'))
  sample <- tm_map(sample, stripWhitespace)
  sample <- tm_map(sample, stripWhitespace)
  dtm <- DocumentTermt(sample[1:3])Matrix(sample)
  return(list(sample,dtm))
  }
 fileName <- 'input.txt'
 test = readChar(fileName, file.info(fileName)$size)
 clean (test)

Answer 1

您必须参考语料库的content，即sample$content中的字符向量：

tm_map(sample, function(x) iconv(enc2utf8(x$content), sub = "bytes"))

在此，我将enc2utf8(x)替换为enc2utf8(x$content)。

Answer 2

您好，将硬币改成2行以下可能会解决您的问题

sample <-VCorpus（VectorSource（text），readerControl = list（language =“ english”）） sample <-tm_map（sample，content_transformer（function（x）iconv（enc2utf8（x），sub =“ bytes”）））

enc2utf8（x）出错：参数不是字符向量

2 个答案: