Question

我正在尝试从https页面下载文件，该页面需要按“我同意”按钮然后存储cookie。如果这个答案在某个地方很明显我很抱歉......

当我直接在Chrome中打开网页并点击“我同意”时，该文件会自动开始下载。

http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?path=SAMHDA&study=32722&bundle=delimited&ds=1&dups=yes

我试图复制this example，但我不认为hangseng网站实际上存储了cookie /身份验证，因此我不知道该示例是否应该是我所需要的。

除此之外，我认为SSL使身份验证变得复杂，因为我认为getURL（）调用将需要证书规范，如cainfo = system.file（“CurlSSL”，“cacert.pem”，package =“RCurl”））

我太过RCurl的初学者了解这个网站是否相当困难，或者我是否只是遗漏了一些明显的东西。

谢谢！

Answer 1

这对httr来说更容易一些，因为它设置了所有内容，以便cookie和https无缝地工作。

生成Cookie的最简单方法是让网站为您执行此操作，方法是手动发布“我同意”表单生成的信息。然后，您再次请求下载实际文件。

library(httr)
terms <- "http://www.icpsr.umich.edu/cgi-bin/terms"
download <- "http://www.icpsr.umich.edu/cgi-bin/bob/zipcart2"

values <- list(agree = "yes", path = "SAMHDA", study = "32722", ds = "", 
  bundle = "all", dups = "yes")

# Accept the terms on the form, 
# generating the appropriate cookies
POST(terms, body = values)
GET(download, query = values)

# Actually download the file (this will take a while)
resp <- GET(download, query = values)

# write the content of the download to a binary file
writeBin(content(resp, "raw"), "c:/temp/thefile.zip")

如何使用R从需要cookie的SSL页面下载压缩文件

1 个答案: