我试图使用RCurl软件包从网站上获取数据表。我的代码可以通过点击网站成功获取您获得的URL:
http://statsheet.com/mcb/teams/air-force/game_stats/
一旦你尝试选择前几年(我想要的);我的代码不再有效。
示例链接: http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013
我猜这与年份特定地址中的保留符号有关。我已经尝试过URLencode以及手动编码地址,但也没有。
我的代码:
library(RCurl)
library(XML)
#Define URL
theurl <-URLencode("http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-
2013", reserved=TRUE)
webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)
# Extract table header and contents
tablehead <- xpathSApply(pagetree, "//*/table[1]/thead[1]/tr[2]/th", xmlValue)
results <- xpathSApply(pagetree,"//*/table[1]/tbody/tr/td", xmlValue)
content <- as.data.frame(matrix(results, ncol = 19, byrow = TRUE))
testtablehead <- c("W/L","Opponent",tablehead[c(2:18)])
names(content) <- testtablehead
R返回的相关错误:
Error in function (type, msg, asError = TRUE) :
Could not resolve host: http%3a%2f%2fstatsheet.com%2fmcb%2fteams%2fair-
force%2fgame_stats%3fseason%3d2012-2013; No data record of requested type
有谁知道问题是什么以及如何解决?
答案 0 :(得分:1)
跳过不需要的编码并下载网址:
library(XML)
url <- "http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013"
pagetree <- htmlTreeParse(url, useInternalNodes = TRUE)