为什么RCurl :: url.exist无法测试具有永久重定向的服务器?

时间:2017-11-06 11:25:17

标签: r rcurl

我在建立连接之前使用RCurl::urls.exist函数测试远程主机。它对我的大多数主机都可以正常工作,但我遇到问题https://eidoo.io/会导致函数无限循环。

该函数使用RCurl::curlPerform通过要求服务器不返回正文来确定对特定URL的请求是否无错误地响应。它只是处理标题。

library(RCurl)
url <- "https://eidoo.io/"
url.exist(url) # This will crash your RStudio
curlPerform(url = url, followlocation = TRUE, nobody = TRUE) # This will crash your RStudio as well

如何在与网站建立连接之前测试该网站?

这是使用https://eidoo.io/运行curlPerform()时详细输出的摘录。遗憾的是,缺少日志的开头,但似乎HTTP/1.1 301 Moved Permanently表示存在永久重定向。

< HTTP/1.1 301 Moved Permanently
< Date: Mon, 06 Nov 2017 10:53:36 GMT
< Content-Type: text/html
< Connection: keep-alive
< Set-Cookie: __cfduid=d41359f0407b83fef208fca7ea017c5d61509965616; expires=Tue, 06-Nov-18 10:53:36 GMT; path=/; domain=.eidoo.io; HttpOnly; Secure
< Location: https://eidoo.io/404
< X-Frame-Options: SAMEORIGIN
< Allow: GET, POST
< Strict-Transport-Security: max-age=0
< Server: cloudflare-nginx
< CF-RAY: 3b97828fcc8f090e-CDG
< 
* Connection #7 to host eidoo.io left intact
* Issue another request to this URL: 'https://eidoo.io/404'
* Found bundle for host eidoo.io: 0x7dde110 [can pipeline]
* Re-using existing connection! (#7) with host eidoo.io
* Connected to eidoo.io (104.25.57.118) port 443 (#7)
> HEAD /404 HTTP/1.1
Host: eidoo.io
Accept: */*

< HTTP/1.1 301 Moved Permanently
< Date: Mon, 06 Nov 2017 10:53:36 GMT
< Content-Type: text/html
< Connection: keep-alive
< Set-Cookie: __cfduid=d41359f0407b83fef208fca7ea017c5d61509965616; expires=Tue, 06-Nov-18 10:53:36 GMT; path=/; domain=.eidoo.io; HttpOnly; Secure
< Location: https://eidoo.io/404
< X-Frame-Options: SAMEORIGIN
< Allow: GET, POST
< Strict-Transport-Security: max-age=0
< Server: cloudflare-nginx
< CF-RAY: 3b97828ffca5090e-CDG
< 
* Connection #7 to host eidoo.io left intact
* Issue another request to this URL: 'https://eidoo.io/404'
* Found bundle for host eidoo.io: 0x7dde110 [can pipeline]
* Re-using existing connection! (#7) with host eidoo.io
* Connected to eidoo.io (104.25.57.118) port 443 (#7)
> HEAD /404 HTTP/1.1
Host: eidoo.io
Accept: */*

< HTTP/1.1 301 Moved Permanently
< Date: Mon, 06 Nov 2017 10:53:36 GMT
< Content-Type: text/html
< Connection: keep-alive
< Set-Cookie: __cfduid=d41359f0407b83fef208fca7ea017c5d61509965616; expires=Tue, 06-Nov-18 10:53:36 GMT; path=/; domain=.eidoo.io; HttpOnly; Secure
< Location: https://eidoo.io/404
< X-Frame-Options: SAMEORIGIN
< Allow: GET, POST
< Strict-Transport-Security: max-age=0
< Server: cloudflare-nginx
< CF-RAY: 3b9782902cb7090e-CDG
< 
* Connection #7 to host eidoo.io left intact
* Issue another request to this URL: 'https://eidoo.io/404'
* Found bundle for host eidoo.io: 0x7dde110 [can pipeline]
* Re-using existing connection! (#7) with host eidoo.io
* Connected to eidoo.io (104.25.57.118) port 443 (#7)
> HEAD /404 HTTP/1.1
Host: eidoo.io
Accept: */*

与此相反,当我在www.google.com上运行相同的命令时,一切正常:

> curlPerform(url = "www.google.com", followlocation = TRUE, nobody = TRUE, verbose = TRUE)
* Rebuilt URL to: www.google.com/
*   Trying 216.58.212.164...
* Connected to www.google.com (216.58.212.164) port 80 (#0)
> HEAD / HTTP/1.1
Host: www.google.com
Accept: */*

< HTTP/1.1 302 Found
< Cache-Control: private
< Content-Type: text/html; charset=UTF-8
< Referrer-Policy: no-referrer
< Location: http://www.google.fr/?gfe_rd=cr&dcr=0&ei=eEIAWrjmJuzG8AfrjYLgBA
< Content-Length: 268
< Date: Mon, 06 Nov 2017 11:07:36 GMT
< 
* Connection #0 to host www.google.com left intact
* Issue another request to this URL: 'http://www.google.fr/?gfe_rd=cr&dcr=0&ei=eEIAWrjmJuzG8AfrjYLgBA'
*   Trying 216.58.212.163...
* Connected to www.google.fr (216.58.212.163) port 80 (#1)
> HEAD /?gfe_rd=cr&dcr=0&ei=eEIAWrjmJuzG8AfrjYLgBA HTTP/1.1
Host: www.google.fr
Accept: */*

< HTTP/1.1 200 OK
< Date: Mon, 06 Nov 2017 11:07:36 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
< P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
< Server: gws
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Set-Cookie: 1P_JAR=2017-11-06-11; expires=Mon, 13-Nov-2017 11:07:36 GMT; path=/; domain=.google.fr
< Set-Cookie: NID=116=2OlLs4BCZcDE1a3y6m-ZWn2Kvp0_rWGxH5XQTOw_pwZOeNn1QisFEpXkrLvxYdKAp2MX0Ff4G0ELoymvR2xVeYM0EjPeVi9LwIqX0x4LTHkPfKHaPt0itOcDXD18_vaG; expires=Tue, 08-May-2018 11:07:36 GMT; path=/; domain=.google.fr; HttpOnly
< Transfer-Encoding: chunked
< Accept-Ranges: none
< Vary: Accept-Encoding
< 
* Connection #1 to host www.google.fr left intact
OK 
 0
>

1 个答案:

答案 0 :(得分:0)

我通过解决方法回复了我自己的问题。问题与HTTP重定向有关,并说现在Curl跟随相对或绝对URL工作正常。

curlPerform(url = url, followlocation = FALSE, nobody = TRUE)