Question

我想使用download.file()将在线数据读取到R，如下所示。

URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")

有人向我建议我添加第setInternet2(TRUE)行，但它仍无效。

我得到的错误是：

Warning messages:
1: running command 'curl  "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"  -o "./data/data.csv"' had status 127 
2: In download.file(URL, destfile = "./data/data.csv", method = "curl",  :
  download had nonzero exit status

感谢您的帮助。

Answer 1

尝试RCurl包可能最容易。安装软件包并尝试以下操作：

# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or 
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
#   RT SERIALNO DIVISION PUMA REGION ST
# 1  H      186        8  700      4 16
# 2  H      306        8  700      4 16
# 3  H      395        8  100      4 16
# 4  H      506        8  700      4 16
# 5  H      835        8  800      4 16
# 6  H      989        8  700      4 16
dim(out)
# [1] 6496  188

download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")

Answer 2

此处有一个截至2014年11月的更新。我发现设置method='curl'对我有用（而method='auto'则没有）。

例如：

# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip')

# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='auto')

# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='curl')

Answer 3

我已成功使用以下代码：

url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)

请注意，我已将协议从 https 更改为 http ，因为R中似乎不支持第一个协议。

Answer 4

如果使用RCurl，您会在GetURL（）函数上遇到SSL错误，然后在GetURL（）之前设置这些选项。这将全局设置CurlSSL设置。

扩展代码：

install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))   
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)

使用R3.1.0在Windows 7 64位上为我工作！

Answer 5

127表示找不到命令

在您的情况下，找不到curl命令。因此，这意味着，没有找到卷曲。

您需要安装/重新安装CURL。就这样。从http://curl.haxx.se/download.html

获取适用于您的操作系统的最新版本

安装前关闭RStudio。

Answer 6

与UseR（原始问题）完全相同的问题，我也使用Windows 7.我尝试了所有提议的解决方案，但他们没有工作。

我按如下方式解决了问题：

使用RStudio代替R控制台。
实现R的版本（从3.1.0到3.1.1），以便库RCurl在其上运行正常。（我现在使用R3.1.1 32位，虽然我的系统是64位）。
我输入了URL地址为https（安全连接）和/而不是反斜杠\\。
设置method = "auto"。

现在对我有用。你应该看到消息：

Content type 'text/csv; charset=utf-8' length 9294 bytes
opened URL
downloaded 9294 by

Answer 7

提供curl包作为替代方案，我发现从在线数据库中提取大文件时是可靠的。在最近的一个项目中，我不得不从在线数据库下载120个文件，发现传输时间减半，并且比download.file更可靠。

#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)

ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm

ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1

ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2

在这种情况下，您网址的粗略时间显示传输时间没有一致的差异。在我的应用程序中，在脚本中使用curl_download从网站选择和下载120个文件会使我的传输时间从每个文件的2000秒减少到1000秒，并将120个文件中的可靠性从50％提高到2个。该脚本发布在我之前提到的问题的答案中，请参阅。

Answer 8

尝试使用重文件

library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)

Answer 9

您可以设置全局选项并尝试 -

options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")

有关问题，请参阅链接 - https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html

使用download.file（）从HTTPS下载文件

9 个答案: