下载大文件时httr GET函数空间不足

时间:2013-06-25 20:15:47

标签: r web-scraping rcurl httr

我正在尝试使用httr下载一个1.1千兆字节的文件但是我遇到了以下错误:

x <- GET( extract.path )
Error in curlPerform(curl = handle$handle, .opts = curl_opts$values) : 
  cannot allocate more space: 1728053248 bytes

我的C盘有400GB免费..

RCurl包中,我在使用maxfilesize时看到了maxfilesize.largegetCurlOptionsConstants()选项,但我不明白这些选项是否/如何传递给httr 1}}通过configset_config ..或者如果我需要切换到RCurl为此...即使我确实需要切换,是否会增加最大文件大小的工作量?

这是我的sessionInfo ..

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] XML_3.96-1.1 httr_0.2    

loaded via a namespace (and not attached):
[1] digest_0.6.0   RCurl_1.95-4.1 stringr_0.6.2  tools_3.0.0   

..和(这不推荐,只是因为它会花费你一段时间)如果你想重现我的错误,你可以去https://usa.ipums.org/usa-action/samples,注册一个新的帐户,选择2011年5年的acs提取,添加大约一百个变量,然后等待提取准备好。然后编辑前三行并运行下面的代码。 (再次,不推荐

your.email <- "email@address.com"
your.password <- "password"
extract.path <- "https://usa.ipums.org/usa-action/downloads/extract_files/some_file.csv.gz"

require(httr)

values <- 
    list(
        "login[email]" = your.email , 
        "login[password]" = your.password , 
        "login[is_for_login]" = 1
    )

POST( "https://usa.ipums.org/usa-action/users/validate_login" , body = values )
GET( "https://usa.ipums.org/usa-action/extract_requests/download" , query = values )

# this line breaks
x <- GET( extract.path )

2 个答案:

答案 0 :(得分:2)

仅供参考 - 这已添加到write_disk()的{​​{1}}控件中: https://github.com/hadley/httr/blob/master/man/write_disk.Rd

答案 1 :(得分:1)

GET调用httr:::make_request这会设置config = list()中定义的curl选项。然而,似乎writefunction otpion在'httr'中被硬编码

opts$writefunction <- getNativeSymbolInfo("R_curl_write_binary_data")$address

你可能需要使用RCurl并定义一个合适的`writefunction'。下列 来自@Martin Morgan的解决方案Create a C-level file handle in RCurl for writing downloaded files似乎是要走的路。