Question

我正在尝试从Internet下载一些数据以与Text Mining中的R一起使用，但是运行命令失败。

命令是：

url <- 'http://www.gutenberg.org/cache/epub/100/pg100.txt' 
arquivo <- read.csv(url)

错误是：

Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string 1
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls

我为read.csv()函数尝试了几个参数，但是没有成功。

Answer 1

这是来自Gutenberg项目的文本（.txt）文档。使用readLines

url <- 'http://www.gutenberg.org/cache/epub/100/pg100.txt' 
arquivo <- readLines(url)

这对我有用

Answer 2

tidyverse软件包readr是一个选择：

arquivo <- readr::read_file(url)

Answer 3

此：

Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string 1
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls

告诉您流中有非文本数据。经检查，这似乎是GZ编码的流，Web浏览器将对其进行即时解码以呈现纯文本。 R可能不想这样做。您可以从该URL获取纯文本版本

> txt = readLines("http://www.gutenberg.org/files/100/100-0.txt")
> txt[14532]
[1] "ADRIANA. To fetch my poor distracted husband hence."
> txt[143532]
[1] "    He looks like sooth. He says he loves my daughter;"

在R中从互联网下载数据时出错

3 个答案: