我正试图从www.dotabuff.com获取该页面。
library(RCurl)
url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
webpage <- getURL(url,verbose = TRUE)
结果是来自dotabuff的页面抱怨了太多请求。我期待一个带有表的html页面,就像在Web浏览器中可以看到的那样。我试过http,https,getURLContent等等。
我认为这与发送的请求getURL类型有关,或者对于该网站可能有些棘手。
答案 0 :(得分:0)
为请求添加标题...
library(RCurl)
url <- "http://www.dotabuff.com/heroes/abaddon/matchups"
options(RCurlOptions = list(verbose = TRUE, useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13"))
webpage <- getURL(url,verbose = TRUE)
* Trying 23.235.40.64...
* Connected to www.dotabuff.com (23.235.40.64) port 80 (#0)
> GET /heroes/abaddon/matchups HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13
Host: www.dotabuff.com
Accept: */*
< HTTP/1.1 200 OK
...