在R中,getURL()会导致页面说出太多请求。但该页面在Broswer中可见

时间:2016-03-01 19:13:08

标签: html r

我正试图从www.dotabuff.com获取该页面。

  library(RCurl)
  url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
  webpage <- getURL(url,verbose = TRUE)

结果是来自dotabuff的页面抱怨了太多请求。我期待一个带有表的html页面,就像在Web浏览器中可以看到的那样。我试过http,https,getURLContent等等。

我认为这与发送的请求getURL类型有关,或者对于该网站可能有些棘手。

1 个答案:

答案 0 :(得分:0)

为请求添加标题...

library(RCurl)
url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
options(RCurlOptions = list(verbose = TRUE, useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13"))
webpage <- getURL(url,verbose = TRUE)
*   Trying 23.235.40.64...
* Connected to www.dotabuff.com (23.235.40.64) port 80 (#0)
> GET /heroes/abaddon/matchups HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13
Host: www.dotabuff.com
Accept: */*

< HTTP/1.1 200 OK
...