Question

我想在R中读取文本文件。用于工作的代码。但是当我想重新测试它时，却没有。

#There are several text files in file'Obama' and file 'Romney'
candidates<-c("Obama","Romney")
pathname<-"C:/txt"
s.dir<-sprintf("%s/%s",pathname,candidates)
article<-Corpus(DirSource(directory=s.dir,encoding="ANSI"))

显示的错误是

Error in iconv(readLines(x, warn = FALSE), encoding, "UTF-8", "byte") : 
unsupported conversion from 'ANSI' to 'UTF-8' in codepage 936

此外，当我使用下面的代码尝试读取单个文本文件时：

m<-"C:/txt/Romney/1.txt"
cc<-Corpus(DirSource(directory=m,encoding="ANSI"))

显示：

Error in DirSource(directory = m, encoding = "ANSI") : empty directory

文件路径存在，为什么我遇到了这个问题？

Answer 1

以下是您需要做的事情：

将文章＆lt; -Corpus（DirSource（directory = s.dir，encoding =＆＃34; ANSI＆＃34;））更改为以下内容：

文章＆lt; - VCorpus（DirSource（目录= s.dir），readerControl = list（reader = readPlain））

在cleanCorpus函数中，将corpus.tmp＆lt; - tm_map（corpus.tmp，tolower）更改为以下内容：

corpus.tmp＆lt; - tm_map（corpus.tmp，content_transformer（tolower））

注意使用＆＃34; content_transformer＆＃34;功能

完成上述操作后，您应该能够解决问题。

Answer 2

转到＆＃34; cran.r-project.org/web/packages/tm/index.html" ;;并下载并安装旧版本的tm，并等到修复bug。

R：阅读文本文件时遇到问题

2 个答案: