我正在尝试从Internet下载一些数据以与Text Mining
中的R
一起使用,但是运行命令失败。
命令是:
url <- 'http://www.gutenberg.org/cache/epub/100/pg100.txt'
arquivo <- read.csv(url)
错误是:
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string 1
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
我为read.csv()
函数尝试了几个参数,但是没有成功。
答案 0 :(得分:1)
这是来自Gutenberg项目的文本(.txt)文档。使用readLines
url <- 'http://www.gutenberg.org/cache/epub/100/pg100.txt'
arquivo <- readLines(url)
这对我有用
答案 1 :(得分:0)
tidyverse
软件包readr
是一个选择:
arquivo <- readr::read_file(url)
答案 2 :(得分:0)
此:
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string 1
In addition: Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
告诉您流中有非文本数据。经检查,这似乎是GZ编码的流,Web浏览器将对其进行即时解码以呈现纯文本。 R可能不想这样做。您可以从该URL获取纯文本版本
> txt = readLines("http://www.gutenberg.org/files/100/100-0.txt")
> txt[14532]
[1] "ADRIANA. To fetch my poor distracted husband hence."
> txt[143532]
[1] " He looks like sooth. He says he loves my daughter;"